Custom Sorting in Pandas Dataframe

Custom sorting in pandas dataframe

Pandas 0.15 introduced Categorical Series, which allows a much clearer way to do this:

First make the month column a categorical and specify the ordering to use.

In [21]: df['m'] = pd.Categorical(df['m'], ["March", "April", "Dec"])

In [22]: df  # looks the same!
Out[22]:
   a  b      m
0  1  2  March
1  5  6    Dec
2  3  4  April

Now, when you sort the month column it will sort with respect to that list:

In [23]: df.sort_values("m")
Out[23]:
   a  b      m
0  1  2  March
2  3  4  April
1  5  6    Dec

Note: if a value is not in the list it will be converted to NaN.

An older answer for those interested...

You could create an intermediary series, and set_index on that:

df = pd.DataFrame([[1, 2, 'March'],[5, 6, 'Dec'],[3, 4, 'April']], columns=['a','b','m'])
s = df['m'].apply(lambda x: {'March':0, 'April':1, 'Dec':3}[x])
s.sort_values()

In [4]: df.set_index(s.index).sort()
Out[4]: 
   a  b      m
0  1  2  March
1  3  4  April
2  5  6    Dec

As commented, in newer pandas, Series has a replace method to do this more elegantly:

s = df['m'].replace({'March':0, 'April':1, 'Dec':3})

The slight difference is that this won't raise if there is a value outside of the dictionary (it'll just stay the same).

Custom sorting with Pandas

One approach would be to use a custom dict to create a 'rank' column, we then use to sort with and then drop the column after sorting:

In [17]:
custom_dict = {'Critical':0, 'High':1, 'Medium':2, 'Low':3}  
df['rank'] = df['Criticality'].map(custom_dict)
df

Out[17]:

  Name Criticality  rank
0  baz        High     1
1  foo    Critical     0
2  baz         Low     3
3  foo      Medium     2
4  bar        High     1
5  bar         Low     3
6  bar      Medium     2

[7 rows x 3 columns]

In [19]:
# now sort by 'Name' and 'rank', it will first sort by 'Name' column first and then 'rank'
df.sort(columns=['Name', 'rank'],inplace=True)
df

Out[19]:

  Name Criticality  rank
4  bar        High     1
6  bar      Medium     2
5  bar         Low     3
0  baz        High     1
2  baz         Low     3
1  foo    Critical     0
3  foo      Medium     2

[7 rows x 3 columns]

In [21]:
# now drop the 'rank' column
df.drop(labels=['rank'],axis=1)

Out[21]:

  Name Criticality
4  bar        High
6  bar      Medium
5  bar         Low
0  baz        High
2  baz         Low
1  foo    Critical
3  foo      Medium

[7 rows x 2 columns]

Sort pandas dataframe by column specifying custom order

From pandas>=1.1.0 you can use the key argument of the .sort_values method to write a lambda function which defines the custom order you prefer.

To do it, you just need to define a custom dictionary with your desired order

custom_dict = {'new': 0, 'fix': 1, 'error': 2}
df.sort_values(by=['col3'], key=lambda x: x.map(custom_dict))

How to sort pandas dataframe by custom order on string index

Just reindex

df.reindex(reorderlist)
Out[89]: 
                 Age   G   Tm  Year     id
Player                                    
Maurice Baker     25   7  VAN  2004   5335
Adrian Caldwell   31  81  DAL  1997   6169
Ratko Varda       22  60  TOT  2001  13950
Ryan Bowen        34  52  OKC  2009   6141
Cedric Hunter     27   6  CHH  1991   2967

Update info you have multiple players with same name

out = df.iloc[pd.Categorical(df.index,reorderlist).argsort()]

Custom sorting multiple levels of column index in pandas

Add sort_remaining=False to sort_index to prevent sorting levels below specified:

df = df.sort_index(level=0, axis=1, sort_remaining=False)

df:

Week        2021-10-11           2021-10-18           2021-10-25          
                Checks Check Avg     Checks Check Avg     Checks Check Avg
11th Street          4         5          2         3          0         1
16th Street         10        11          8         9          6         7
Bala Cynwyd         16        17         14        15         12        13

Sample Data Used:

import numpy as np
import pandas as pd

df = pd.DataFrame(
    np.arange(18).reshape((-1, 6)),
    index=['11th Street', '16th Street', 'Bala Cynwyd'],
    columns=pd.MultiIndex.from_arrays(
        [np.repeat(pd.to_datetime(['2021-10-25', '2021-10-18', '2021-10-11']),
                   2),
         ['Checks', 'Check Avg'] * 3],
        names=['Week', None]
    )
)

df:

Week        2021-10-25           2021-10-18           2021-10-11          
                Checks Check Avg     Checks Check Avg     Checks Check Avg
11th Street          0         1          2         3          4         5
16th Street          6         7          8         9         10        11
Bala Cynwyd         12        13         14        15         16        17

sorting by a custom list in pandas

Below is an example that performs lexicographic sort on a dataframe.
The idea is to create an numerical index based on the specific sort.
Then to perform a numerical sort based on the index.
A column is added to the dataframe to do so, and is then removed.

import pandas as pd

# Create DataFrame
df = pd.DataFrame(
{'id':[2967, 5335, 13950, 6141, 6169],
    'Player': ['Cedric Hunter', 'Maurice Baker',
               'Ratko Varda' ,'Ryan Bowen' ,'Adrian Caldwell'],
    'Year': [1991, 2004, 2001, 2009, 1997],
    'Age': [27, 25, 22, 34, 31],
    'Tm': ['CHH' ,'VAN' ,'TOT' ,'OKC', 'DAL'],
    'G': [6, 7, 60, 52, 81]})

# Define the sorter
sorter = ['TOT', 'ATL', 'BOS', 'BRK', 'CHA', 'CHH', 'CHI', 'CLE', 'DAL','DEN',
          'DET', 'GSW', 'HOU', 'IND', 'LAC', 'LAL', 'MEM', 'MIA', 'MIL',
          'MIN', 'NJN', 'NOH', 'NOK', 'NOP', 'NYK', 'OKC', 'ORL', 'PHI',
          'PHO', 'POR', 'SAC', 'SAS', 'SEA', 'TOR', 'UTA', 'VAN',
          'WAS', 'WSB']

# Create the dictionary that defines the order for sorting
sorterIndex = dict(zip(sorter, range(len(sorter))))

# Generate a rank column that will be used to sort
# the dataframe numerically
df['Tm_Rank'] = df['Tm'].map(sorterIndex)

# Here is the result asked with the lexicographic sort
# Result may be hard to analyze, so a second sorting is
# proposed next
## NOTE: 
## Newer versions of pandas use 'sort_values' instead of 'sort'
df.sort_values(['Player', 'Year', 'Tm_Rank'],
        ascending = [True, True, True], inplace = True)
df.drop('Tm_Rank', 1, inplace = True)
print(df)

# Here is an example where 'Tm' is sorted first, that will 
# give the first row of the DataFrame df to contain TOT as 'Tm'
df['Tm_Rank'] = df['Tm'].map(sorterIndex)
## NOTE: 
## Newer versions of pandas use 'sort_values' instead of 'sort'
df.sort_values(['Tm_Rank', 'Player', 'Year'],
        ascending = [True , True, True], inplace = True)
df.drop('Tm_Rank', 1, inplace = True)
print(df)

custom sorting pandas dataframe

You could compute the frequency and use the sort method to find the desired order of the index. Then use df.loc[order.index] to reorder the original DataFrame:

order = df.sum(axis=1).sort(inplace=False)
result = df.loc[order.index]

For example,

import pandas as pd

df = pd.DataFrame({
    'one.txt': [1, 0, 1, 0, 0, 1, 1, 2, 0],
    'third.txt': [1, 0, 1, 0, 1, 0, 1, 0, 0],
    'two.txt': [0, 1, 1, 1, 0, 0, 1, 0, 1]}, 
    index=['a', 'i', 'is', 'no', 'not', 'really', 'sentence', 'short', 'think'])

order = df.sum(axis=1).sort(inplace=False, ascending=False)
print(df.loc[order.index])

yields

          one.txt  third.txt  two.txt
sentence        1          1        1
is              1          1        1
short           2          0        0
a               1          1        0
think           0          0        1
really          1          0        0
not             0          1        0
no              0          0        1
i               0          0        1

Custom Sorting in Pandas Dataframe