Sorting by a Custom List in Pandas

sorting by a custom list in pandas

Below is an example that performs lexicographic sort on a dataframe.
The idea is to create an numerical index based on the specific sort.
Then to perform a numerical sort based on the index.
A column is added to the dataframe to do so, and is then removed.

import pandas as pd

# Create DataFrame
df = pd.DataFrame(
{'id':[2967, 5335, 13950, 6141, 6169],
'Player': ['Cedric Hunter', 'Maurice Baker',
'Ratko Varda' ,'Ryan Bowen' ,'Adrian Caldwell'],
'Year': [1991, 2004, 2001, 2009, 1997],
'Age': [27, 25, 22, 34, 31],
'Tm': ['CHH' ,'VAN' ,'TOT' ,'OKC', 'DAL'],
'G': [6, 7, 60, 52, 81]})

# Define the sorter
sorter = ['TOT', 'ATL', 'BOS', 'BRK', 'CHA', 'CHH', 'CHI', 'CLE', 'DAL','DEN',
'DET', 'GSW', 'HOU', 'IND', 'LAC', 'LAL', 'MEM', 'MIA', 'MIL',
'MIN', 'NJN', 'NOH', 'NOK', 'NOP', 'NYK', 'OKC', 'ORL', 'PHI',
'PHO', 'POR', 'SAC', 'SAS', 'SEA', 'TOR', 'UTA', 'VAN',
'WAS', 'WSB']

# Create the dictionary that defines the order for sorting
sorterIndex = dict(zip(sorter, range(len(sorter))))

# Generate a rank column that will be used to sort
# the dataframe numerically
df['Tm_Rank'] = df['Tm'].map(sorterIndex)

# Here is the result asked with the lexicographic sort
# Result may be hard to analyze, so a second sorting is
# proposed next
## NOTE:
## Newer versions of pandas use 'sort_values' instead of 'sort'
df.sort_values(['Player', 'Year', 'Tm_Rank'],
ascending = [True, True, True], inplace = True)
df.drop('Tm_Rank', 1, inplace = True)
print(df)

# Here is an example where 'Tm' is sorted first, that will
# give the first row of the DataFrame df to contain TOT as 'Tm'
df['Tm_Rank'] = df['Tm'].map(sorterIndex)
## NOTE:
## Newer versions of pandas use 'sort_values' instead of 'sort'
df.sort_values(['Tm_Rank', 'Player', 'Year'],
ascending = [True , True, True], inplace = True)
df.drop('Tm_Rank', 1, inplace = True)
print(df)

Custom sorting in pandas dataframe

Pandas 0.15 introduced Categorical Series, which allows a much clearer way to do this:

First make the month column a categorical and specify the ordering to use.

In [21]: df['m'] = pd.Categorical(df['m'], ["March", "April", "Dec"])

In [22]: df # looks the same!
Out[22]:
a b m
0 1 2 March
1 5 6 Dec
2 3 4 April

Now, when you sort the month column it will sort with respect to that list:

In [23]: df.sort_values("m")
Out[23]:
a b m
0 1 2 March
2 3 4 April
1 5 6 Dec

Note: if a value is not in the list it will be converted to NaN.


An older answer for those interested...

You could create an intermediary series, and set_index on that:

df = pd.DataFrame([[1, 2, 'March'],[5, 6, 'Dec'],[3, 4, 'April']], columns=['a','b','m'])
s = df['m'].apply(lambda x: {'March':0, 'April':1, 'Dec':3}[x])
s.sort_values()

In [4]: df.set_index(s.index).sort()
Out[4]:
a b m
0 1 2 March
1 3 4 April
2 5 6 Dec

As commented, in newer pandas, Series has a replace method to do this more elegantly:

s = df['m'].replace({'March':0, 'April':1, 'Dec':3})

The slight difference is that this won't raise if there is a value outside of the dictionary (it'll just stay the same).

Sort pandas dataframe by column specifying custom order

From pandas>=1.1.0 you can use the key argument of the .sort_values method to write a lambda function which defines the custom order you prefer.

To do it, you just need to define a custom dictionary with your desired order

custom_dict = {'new': 0, 'fix': 1, 'error': 2}
df.sort_values(by=['col3'], key=lambda x: x.map(custom_dict))

Sort pandas df column by a custom list of values

The solution you mention is a good starting point. You can use ordered=True with set_categories to ensure you set categorical ordering as required:

df['Tm'] = df['Tm'].astype('category')
not_in_list = df['Tm'].cat.categories.difference(sorter)
df['Tm'] = df['Tm'].cat.set_categories(np.hstack((sorter, not_in_list)), ordered=True)

df = df.sort_values('Tm')

print(df)

Age G Player Tm Year id
2 22 60 Ratko Varda TOT 2001 13950
0 27 6 Cedric Hunter CHH 1991 2967
3 34 52 Ryan Bowen OKC 2009 6141
1 25 7 Maurice Baker VAN 2004 5335
4 31 81 Adrian Caldwell value_not_present_in_sorter 1997 6169

How to sort a pandas DataFrame on one column given an already ordered list of the values in that column?

Approach 1

Convert the Fruit column to ordered categorical type and sort the values

df['fruit'] = pd.Categorical(df['fruit'], ordered_list, ordered=True)
df.sort_values('fruit')

Approach 2

Sort the values by passing a key function, which maps the fruit names to there corresponding order

df.sort_values('fruit', key=lambda x: x.map({v:k for k, v in enumerate(ordered_list)}))


   id      fruit  trash
2 3 pineapple 93
1 2 banana 22
3 4 orange 1
4 5 orange 15
0 1 apple 38

How to sort pandas dataframe by custom order on string index

Just reindex

df.reindex(reorderlist)
Out[89]:
Age G Tm Year id
Player
Maurice Baker 25 7 VAN 2004 5335
Adrian Caldwell 31 81 DAL 1997 6169
Ratko Varda 22 60 TOT 2001 13950
Ryan Bowen 34 52 OKC 2009 6141
Cedric Hunter 27 6 CHH 1991 2967

Update info you have multiple players with same name

out = df.iloc[pd.Categorical(df.index,reorderlist).argsort()]

Sorting a pandas DataFrame by the order of a list

You could make the Class column your index column

df = df.set_index('Class')

and then use df.loc to reindex the DataFrame with class_list:

df.loc[class_list]

Minimal example:

>>> df = pd.DataFrame({'Class': ['Gammaproteobacteria', 'Bacteroidetes', 'Negativicutes'], 'Number': [3, 5, 6]})
>>> df
Class Number
0 Gammaproteobacteria 3
1 Bacteroidetes 5
2 Negativicutes 6

>>> df = df.set_index('Class')
>>> df.loc[['Bacteroidetes', 'Negativicutes', 'Gammaproteobacteria']]
Number
Bacteroidetes 5
Negativicutes 6
Gammaproteobacteria 3

Custom sorting with Pandas

One approach would be to use a custom dict to create a 'rank' column, we then use to sort with and then drop the column after sorting:

In [17]:
custom_dict = {'Critical':0, 'High':1, 'Medium':2, 'Low':3}
df['rank'] = df['Criticality'].map(custom_dict)
df

Out[17]:

Name Criticality rank
0 baz High 1
1 foo Critical 0
2 baz Low 3
3 foo Medium 2
4 bar High 1
5 bar Low 3
6 bar Medium 2

[7 rows x 3 columns]

In [19]:
# now sort by 'Name' and 'rank', it will first sort by 'Name' column first and then 'rank'
df.sort(columns=['Name', 'rank'],inplace=True)
df

Out[19]:

Name Criticality rank
4 bar High 1
6 bar Medium 2
5 bar Low 3
0 baz High 1
2 baz Low 3
1 foo Critical 0
3 foo Medium 2

[7 rows x 3 columns]

In [21]:
# now drop the 'rank' column
df.drop(labels=['rank'],axis=1)

Out[21]:

Name Criticality
4 bar High
6 bar Medium
5 bar Low
0 baz High
2 baz Low
1 foo Critical
3 foo Medium

[7 rows x 2 columns]

Pandas df: Sort by index based on custom list

Let's use sort_values with a custom sort dictionary:

# create dictionary with key as firm, value as order in the list
ordered_dict = {k: i for i, k in enumerate(Ordered_list)}

# sort by index with the created dictionary
df.sort_index(key=lambda x: x.map(ordered_dict))


Related Topics



Leave a reply



Submit