Order Dataframe for Given Columns

how to sort pandas dataframe from one column

Use sort_values to sort the df by a specific column's values:

In [18]:
df.sort_values('2')

Out[18]:
0 1 2
4 85.6 January 1.0
3 95.5 February 2.0
7 104.8 March 3.0
0 354.7 April 4.0
8 283.5 May 5.0
6 238.7 June 6.0
5 152.0 July 7.0
1 55.4 August 8.0
11 212.7 September 9.0
10 249.6 October 10.0
9 278.8 November 11.0
2 176.5 December 12.0

If you want to sort by two columns, pass a list of column labels to sort_values with the column labels ordered according to sort priority. If you use df.sort_values(['2', '0']), the result would be sorted by column 2 then column 0. Granted, this does not really make sense for this example because each value in df['2'] is unique.

Sorting columns in pandas dataframe based on column name

df = df.reindex(sorted(df.columns), axis=1)

This assumes that sorting the column names will give the order you want. If your column names won't sort lexicographically (e.g., if you want column Q10.3 to appear after Q9.1), you'll need to sort differently, but that has nothing to do with pandas.

How to sort a pandas DataFrame on one column given an already ordered list of the values in that column?

Approach 1

Convert the Fruit column to ordered categorical type and sort the values

df['fruit'] = pd.Categorical(df['fruit'], ordered_list, ordered=True)
df.sort_values('fruit')

Approach 2

Sort the values by passing a key function, which maps the fruit names to there corresponding order

df.sort_values('fruit', key=lambda x: x.map({v:k for k, v in enumerate(ordered_list)}))


   id      fruit  trash
2 3 pineapple 93
1 2 banana 22
3 4 orange 1
4 5 orange 15
0 1 apple 38

Custom sorting in pandas dataframe

Pandas 0.15 introduced Categorical Series, which allows a much clearer way to do this:

First make the month column a categorical and specify the ordering to use.

In [21]: df['m'] = pd.Categorical(df['m'], ["March", "April", "Dec"])

In [22]: df # looks the same!
Out[22]:
a b m
0 1 2 March
1 5 6 Dec
2 3 4 April

Now, when you sort the month column it will sort with respect to that list:

In [23]: df.sort_values("m")
Out[23]:
a b m
0 1 2 March
2 3 4 April
1 5 6 Dec

Note: if a value is not in the list it will be converted to NaN.


An older answer for those interested...

You could create an intermediary series, and set_index on that:

df = pd.DataFrame([[1, 2, 'March'],[5, 6, 'Dec'],[3, 4, 'April']], columns=['a','b','m'])
s = df['m'].apply(lambda x: {'March':0, 'April':1, 'Dec':3}[x])
s.sort_values()

In [4]: df.set_index(s.index).sort()
Out[4]:
a b m
0 1 2 March
1 3 4 April
2 5 6 Dec

As commented, in newer pandas, Series has a replace method to do this more elegantly:

s = df['m'].replace({'March':0, 'April':1, 'Dec':3})

The slight difference is that this won't raise if there is a value outside of the dictionary (it'll just stay the same).

Pandas: Sort a dataframe based on multiple columns

You can swap columns in list and also values in ascending parameter:

Explanation:

Order of columns names is order of sorting, first sort descending by Employee_Count and if some duplicates in Employee_Count then sorting by Department only duplicates rows ascending.

df1 = df.sort_values(['Employee_Count', 'Department'], ascending=[False, True])
print (df1)
Department Employee_Count
4 xyz 15
2 bca 11
0 abc 10 <-
1 adc 10 <-
3 cde 9

Or for test if use second False then duplicated rows are sorting descending:

df2 = df.sort_values(['Employee_Count', 'Department',],ascending=[False, False])
print (df2)
Department Employee_Count
4 xyz 15
2 bca 11
1 adc 10 <-
0 abc 10 <-
3 cde 9

pandas - Sorting Columns in a custom order

First idea is use list comprehension and join lists by +:

expected_columns = ['cust_id','cost_id','sale_id','prod_id']

df = pd.DataFrame(columns=['customer_name','cust_id','sale_id','sale_time'])

expected_columns = ['cust_id','cost_id','sale_id','prod_id']
new1 = [c for c in df.columns if c in expected_columns]
new2 = [c for c in df.columns if c not in expected_columns]

new = new1 + new2
print (new)
['cust_id','sale_id','customer_name','sale_time']

Or using Index.intersection with Index.difference:

expected_columns = ['cust_id','cost_id','sale_id','prod_id']

new = (df.columns.intersection(expected_columns, sort=False).tolist() +
df.columns.difference(expected_columns, sort=False).tolist())

If also ordering in ouput by expected_columns is important use:

new = (pd.Index(expected_columns).intersection(df.columns, sort=False).tolist() +
df.columns.difference(expected_columns, sort=False).tolist())

Difference is changed sample data:

expected_columns = ['sale_id','cost_id','cust_id','prod_id']

df = pd.DataFrame(columns=['customer_name','cust_id','sale_id','sale_time'])

new = (pd.Index(expected_columns).intersection(df.columns, sort=False).tolist() +
df.columns.difference(expected_columns, sort=False).tolist())
print (new)
['sale_id', 'cust_id', 'customer_name', 'sale_time']

new = (df.columns.intersection(expected_columns, sort=False).tolist() +
df.columns.difference(expected_columns, sort=False).tolist())
print (new)
['cust_id', 'sale_id', 'customer_name', 'sale_time']

Last for change order of columns use:

df = df[new]

How to sort a Pandas DataFrame with multiple columns, some in ascending order and others descending order?

The DataFrame.sort_values method can handle this very easily. Just use the ascending argument and provide a list of boolean values.

import pandas as pd

my_df = pd.DataFrame({'col1':['a','a','a','a','b','b','b','b','c','c','c','c'],
'col2':[1,1,2,2,1,1,2,2,1,1,2,2],
'col3':[1,2,1,2,1,2,1,2,1,2,1,2]})

my_df = my_df.sort_values(by=['col1','col2','col3'],
ascending=[False, False, True])

Note that the list provided in the ascending argument must have the same length as the one provided in the by argument.



Related Topics



Leave a reply



Submit