Fastest way to sort each row in a pandas dataframe
I think I would do this in numpy:
In [11]: a = df.values
In [12]: a.sort(axis=1) # no ascending argument
In [13]: a = a[:, ::-1] # so reverse
In [14]: a
Out[14]:
array([[8, 4, 3, 1],
[9, 7, 2, 2]])
In [15]: pd.DataFrame(a, df.index, df.columns)
Out[15]:
A B C D
0 8 4 3 1
1 9 7 2 2
I had thought this might work, but it sorts the columns:
In [21]: df.sort(axis=1, ascending=False)
Out[21]:
D C B A
0 1 8 4 3
1 2 7 2 9
Ah, pandas raises:In [22]: df.sort(df.columns, axis=1, ascending=False)
ValueError: When sorting by column, axis must be 0 (rows)
how to sort pandas dataframe from one column
Use sort_values
to sort the df by a specific column's values:
In [18]:
df.sort_values('2')
Out[18]:
0 1 2
4 85.6 January 1.0
3 95.5 February 2.0
7 104.8 March 3.0
0 354.7 April 4.0
8 283.5 May 5.0
6 238.7 June 6.0
5 152.0 July 7.0
1 55.4 August 8.0
11 212.7 September 9.0
10 249.6 October 10.0
9 278.8 November 11.0
2 176.5 December 12.0
If you want to sort by two columns, pass a list of column labels to sort_values
with the column labels ordered according to sort priority. If you use df.sort_values(['2', '0'])
, the result would be sorted by column 2
then column 0
. Granted, this does not really make sense for this example because each value in df['2']
is unique. How to sort each row in pandas dataframe and get indices instead?
you can use numpy for that with argsort:
df = pd.DataFrame([[0.5,0.7,0.1],[0.1,0.7,0.5]])
array = df.values.argsort(axis=1)[:,::-1]
new_df = pd.DataFrame(array)
output new_df
: 0 1 2
0 1 0 2
1 1 2 0
Note:as commented by @anky there is something that doesnt make sense in the ouput you show,, also i assumed you want descending order and thats why the [:,::-1]
slice in the result/
UPDATE
as @anky suggested in comments here it is still using the same idea of argsort,
this is more strightforward solution then df.values.argsort(axis=1)[:,::-1]
:
np.argsort(-df)
How to sort ascending row-wise in Pandas Dataframe
You can sorting rows by numpy.sort
, swap ordering for descending order by [:, ::-1]
and pass to DataFrame constructor if performance is important:
df = pd.DataFrame(np.sort(df, axis=1)[:, ::-1],
columns=df.columns,
index=df.index)
print (df)
N1 N2 N3 N4 N5
0 48 45 21 20 12
1 41 36 32 29 16
2 42 41 34 13 9
3 39 37 33 7 4
4 39 32 21 3 1
1313 42 36 27 5 1
1314 48 38 35 20 18
1315 42 38 37 34 12
1316 42 41 37 23 18
1317 35 34 18 10 2
A bit worse performance if assign back:df[:] = np.sort(df, axis=1)[:, ::-1]
Performance:#10k rows
df = pd.concat([df] * 1000, ignore_index=True)
#Ynjxsjmh sol
In [200]: %timeit df.apply(lambda row: list(reversed(sorted(row))), axis=1, result_type='expand')
595 ms ± 19.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
#Andrej Kesely sol1
In [201]: %timeit df[:] = np.fliplr(np.sort(df, axis=1))
559 µs ± 38.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
#Andrej Kesely sol2
In [202]: %timeit df.loc[:, ::-1] = np.sort(df, axis=1)
518 µs ± 11 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
#jezrael sol2
In [203]: %timeit df[:] = np.sort(df, axis=1)[:, ::-1]
491 µs ± 15.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
#jezrael sol1
In [204]: %timeit pd.DataFrame(np.sort(df, axis=1)[:, ::-1], columns=df.columns, index=df.index)
399 µs ± 2.31 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Python - Sorting the values of every row in a table and get a new Pandas dataframe with original column index/labels in sorted sequence in each row
You can use .apply()
on each row to sort values in descending order and get the index (i.e. column labels) of sorted sequence:
df2 = (df.set_index('Date')[['Company1', 'Company2', 'Company3']]
.replace(r',', r'.', regex=True)
.astype(float)
.apply(lambda x: x.sort_values(ascending=False).index.tolist(), axis=1, result_type='expand')
.pipe(lambda x: x.set_axis(x.columns+1, axis=1))
.reset_index()
)
Result:print(df2)
Date 1 2 3
0 01.01.2020 Company1 Company3 Company2
1 02.01.2020 Company2 Company3 Company1
2 24.10.2020 Company3 Company1 Company2
How to sort each row of pandas dataframe and return column index based on sorted values of row
This is probably as fast as it gets with numpy:
def sort_df(df):
return pd.DataFrame(
data=df.columns.values[np.argsort(-df.values, axis=1)],
columns=['tag_{}'.format(i) for i in range(df.shape[1])]
)
print(sort_df(gapminder.head(3)))
tag_0 tag_1 tag_2 tag_3
0 pop year gdpPercap lifeExp
1 pop year gdpPercap lifeExp
2 pop year gdpPercap lifeExp
Explanation: np.argsort
sorts the values along rows, but returns the indices that sort the array instead of sorted values, which can be used for co-sorting arrays. The minus sorts in descending order. In your case, you use the indices to sort the columns. numpy broadcasting takes care of returning the correct shape.Runtime is around 3ms for your example vs 2.5s with your function.
Sort each row individually between two columns
You can use:
df[['column_01','column_02']] =
df[['column_01','column_02']].apply(lambda x: sorted(x.values), axis=1)
print (df)
column_01 column_02 value
0 aaa ccc 1
1 bbb ddd 34
2 aaa ddd 98
Another solutions:df[['column_01','column_02']] = pd.DataFrame(np.sort(df[['column_01','column_02']].values),
index=df.index, columns=['column_01','column_02'])
only with numpy array:df[['column_01','column_02']] = np.sort(df[['column_01','column_02']].values)
print (df)
column_01 column_02 value
0 aaa ccc 1
1 bbb ddd 34
2 aaa ddd 98
Second solution is faster, because apply
use loops:df = pd.concat([df]*1000).reset_index(drop=True)
In [177]: %timeit df[['column_01','column_02']] = pd.DataFrame(np.sort(df[['column_01','column_02']].values), index=df.index, columns=['column_01','column_02'])
1000 loops, best of 3: 1.36 ms per loop
In [182]: %timeit df[['column_01','column_02']] = np.sort(df[['column_01','column_02']].values)
1000 loops, best of 3: 1.54 ms per loop
In [178]: %timeit df[['column_01','column_02']] = (df[['column_01','column_02']].apply(lambda x: sorted(x.values), axis=1))
1 loop, best of 3: 291 ms per loop
Pandas sort each row and print the top 5
If you rather want to print out each column separately, this should work
df.apply(lambda x: print(x.sort_values(ascending=False).head(5)), axis=0)
Related Topics
Multiprocessing:Use Tqdm to Display a Progress Bar
Does Conda Replace the Need for Virtualenv
How to Combine Multiple Rows into a Single Row with Pandas
How to Have Shared Log Files Under Windows
Elif' in List Comprehension Conditionals
Best Way to Make Django's Login_Required the Default
Pandas - Filter Dataframe by Another Dataframe by Row Elements
Filename and Line Number of Python Script
Setting Up S3 for Logs in Airflow
Why Use Sys.Path.Append(Path) Instead of Sys.Path.Insert(1, Path)
Running Jupyter with Multiple Python and Ipython Paths
Importerror: Cannot Import Name Numpy_Mkl
Pip: How to Install a Git Pull Request
Plotting Multiple Lines, in Different Colors, with Pandas Dataframe
Typeerror: Expected String or Buffer
Listing Contents of a Bucket with Boto3