Sort (Order) Data Frame Rows by Multiple Columns

How to sort a dataFrame in python pandas by two or more columns?

As of the 0.17.0 release, the sort method was deprecated in favor of sort_values. sort was completely removed in the 0.20.0 release. The arguments (and results) remain the same:

df.sort_values(['a', 'b'], ascending=[True, False])

You can use the ascending argument of sort:

df.sort(['a', 'b'], ascending=[True, False])

For example:

In [11]: df1 = pd.DataFrame(np.random.randint(1, 5, (10,2)), columns=['a','b'])

In [12]: df1.sort(['a', 'b'], ascending=[True, False])
Out[12]:
a b
2 1 4
7 1 3
1 1 2
3 1 2
4 3 2
6 4 4
0 4 3
9 4 3
5 4 1
8 4 1

As commented by @renadeen

Sort isn't in place by default! So you should assign result of the sort method to a variable or add inplace=True to method call.

that is, if you want to reuse df1 as a sorted DataFrame:

df1 = df1.sort(['a', 'b'], ascending=[True, False])

or

df1.sort(['a', 'b'], ascending=[True, False], inplace=True)

How to sort dataframe rows by multiple columns

Use sort_values, which can accept a list of sorting targets. In this case it sounds like you want to sort by S/N, then Dis, then Rate:

df = df.sort_values(['S/N', 'Dis', 'Rate'])

# S/N Dis Rate
# 0 332 4.6030 91.204062
# 3 332 9.1985 76.212943
# 6 332 14.4405 77.664282
# 9 332 20.2005 76.725955
# 12 332 25.4780 31.597510
# 15 332 30.6670 74.096975
# 1 445 5.4280 60.233917
# 4 445 9.7345 31.902842
# 7 445 14.6015 36.261851
# 10 445 19.8630 40.705467
# 13 445 24.9050 4.897008
# 16 445 30.0550 35.217889
# ...

How to sort a Pandas DataFrame with multiple columns, some in ascending order and others descending order?

The DataFrame.sort_values method can handle this very easily. Just use the ascending argument and provide a list of boolean values.

import pandas as pd

my_df = pd.DataFrame({'col1':['a','a','a','a','b','b','b','b','c','c','c','c'],
'col2':[1,1,2,2,1,1,2,2,1,1,2,2],
'col3':[1,2,1,2,1,2,1,2,1,2,1,2]})

my_df = my_df.sort_values(by=['col1','col2','col3'],
ascending=[False, False, True])

Note that the list provided in the ascending argument must have the same length as the one provided in the by argument.

Sort (order) data frame rows by multiple columns

You can use the order() function directly without resorting to add-on tools -- see this simpler answer which uses a trick right from the top of the example(order) code:

R> dd[with(dd, order(-z, b)), ]
b x y z
4 Low C 9 2
2 Med D 3 1
1 Hi A 8 1
3 Hi A 9 1

Edit some 2+ years later: It was just asked how to do this by column index. The answer is to simply pass the desired sorting column(s) to the order() function:

R> dd[order(-dd[,4], dd[,1]), ]
b x y z
4 Low C 9 2
2 Med D 3 1
1 Hi A 8 1
3 Hi A 9 1
R>

rather than using the name of the column (and with() for easier/more direct access).

Pandas: Sort a dataframe based on multiple columns

You can swap columns in list and also values in ascending parameter:

Explanation:

Order of columns names is order of sorting, first sort descending by Employee_Count and if some duplicates in Employee_Count then sorting by Department only duplicates rows ascending.

df1 = df.sort_values(['Employee_Count', 'Department'], ascending=[False, True])
print (df1)
Department Employee_Count
4 xyz 15
2 bca 11
0 abc 10 <-
1 adc 10 <-
3 cde 9

Or for test if use second False then duplicated rows are sorting descending:

df2 = df.sort_values(['Employee_Count', 'Department',],ascending=[False, False])
print (df2)
Department Employee_Count
4 xyz 15
2 bca 11
1 adc 10 <-
0 abc 10 <-
3 cde 9

Sort two columns in ascending order for each dataframe in a list using a for loop in r

We can either use lapply in base R

Split_Data <- lapply(Split_Data, function(x) x[order(-x$Scale_1, -x$Scale_2),])

or with map from purrr and arrange (by default it is in ascending order)

library(purrr)
library(dplyr)
Split_Data <- map(Split_Data, ~ .x %>%
arrange(desc(Scale_1), desc(Scale_2)))

Reorder rows of Pandas dataframe using custom order over multiple columns

You can represent Name and Subject as categorical variables:

names = ['Dan','Tim','Ari']
subjects = ['Science','History','Math']

df = df.astype({'Name': pd.CategoricalDtype(names, ordered=True),
'Subject': pd.CategoricalDtype(subjects, ordered=True)})
>>> df.sort_values(['Name', 'Subject'])
Name Subject Test1 Test2
7 Dan Science 58 28
8 Dan History 10 50
6 Dan Math 10 1
1 Tim Science 46 78
2 Tim History 54 61
0 Tim Math 10 5
4 Ari Science 83 32
5 Ari History 39 43
3 Ari Math 10 7

>>> df.sort_values(['Subject', 'Name'])
Name Subject Test1 Test2
7 Dan Science 58 28
1 Tim Science 46 78
4 Ari Science 83 32
8 Dan History 10 50
2 Tim History 54 61
5 Ari History 39 43
6 Dan Math 10 1
0 Tim Math 10 5
3 Ari Math 10 7

How to sort a Pandas DataFrame by multiple columns in Python with ordered number

There is an issue when you write:

ordered = trade_value_df.sort_values(by=['Trade Value'], ascending=False,ignore_index=True)

You are assigning something new to the name ordered, so you're effectively losing the dataframe you had previously assigned to that name.

A possibility is to do all the operations on the same dataframe, rather than have multiple dataframe:

import pandas as pd

df = pd.DataFrame({'Code':['Apple', 'Amazon', 'Facebook', 'Samsung'], 'Volume':[500, 1000, 250, 100], 'Trade Value': [1000, 500, 750, 1500]})

df = df.sort_values(by=['Volume'], ascending=False,ignore_index=True)
df['Volume Order'] = df.index + 1

df = df.sort_values(by=['Trade Value'], ascending=False,ignore_index=True)
df['Trade Order'] = df.index + 1

print(df)
# Code Volume Trade Value Volume Order Trade Order
# 0 Samsung 100 1500 4 1
# 1 Apple 500 1000 2 2
# 2 Facebook 250 750 3 3
# 3 Amazon 1000 500 1 4


Related Topics



Leave a reply



Submit