sorting by a custom list in pandas
Below is an example that performs lexicographic sort on a dataframe.
The idea is to create an numerical index based on the specific sort.
Then to perform a numerical sort based on the index.
A column is added to the dataframe to do so, and is then removed.
import pandas as pd
# Create DataFrame
df = pd.DataFrame(
{'id':[2967, 5335, 13950, 6141, 6169],
'Player': ['Cedric Hunter', 'Maurice Baker',
'Ratko Varda' ,'Ryan Bowen' ,'Adrian Caldwell'],
'Year': [1991, 2004, 2001, 2009, 1997],
'Age': [27, 25, 22, 34, 31],
'Tm': ['CHH' ,'VAN' ,'TOT' ,'OKC', 'DAL'],
'G': [6, 7, 60, 52, 81]})
# Define the sorter
sorter = ['TOT', 'ATL', 'BOS', 'BRK', 'CHA', 'CHH', 'CHI', 'CLE', 'DAL','DEN',
'DET', 'GSW', 'HOU', 'IND', 'LAC', 'LAL', 'MEM', 'MIA', 'MIL',
'MIN', 'NJN', 'NOH', 'NOK', 'NOP', 'NYK', 'OKC', 'ORL', 'PHI',
'PHO', 'POR', 'SAC', 'SAS', 'SEA', 'TOR', 'UTA', 'VAN',
'WAS', 'WSB']
# Create the dictionary that defines the order for sorting
sorterIndex = dict(zip(sorter, range(len(sorter))))
# Generate a rank column that will be used to sort
# the dataframe numerically
df['Tm_Rank'] = df['Tm'].map(sorterIndex)
# Here is the result asked with the lexicographic sort
# Result may be hard to analyze, so a second sorting is
# proposed next
## NOTE:
## Newer versions of pandas use 'sort_values' instead of 'sort'
df.sort_values(['Player', 'Year', 'Tm_Rank'],
ascending = [True, True, True], inplace = True)
df.drop('Tm_Rank', 1, inplace = True)
print(df)
# Here is an example where 'Tm' is sorted first, that will
# give the first row of the DataFrame df to contain TOT as 'Tm'
df['Tm_Rank'] = df['Tm'].map(sorterIndex)
## NOTE:
## Newer versions of pandas use 'sort_values' instead of 'sort'
df.sort_values(['Tm_Rank', 'Player', 'Year'],
ascending = [True , True, True], inplace = True)
df.drop('Tm_Rank', 1, inplace = True)
print(df)
Custom sorting in pandas dataframe
Pandas 0.15 introduced Categorical Series, which allows a much clearer way to do this:
First make the month column a categorical and specify the ordering to use.
In [21]: df['m'] = pd.Categorical(df['m'], ["March", "April", "Dec"])
In [22]: df # looks the same!
Out[22]:
a b m
0 1 2 March
1 5 6 Dec
2 3 4 April
Now, when you sort the month column it will sort with respect to that list:
In [23]: df.sort_values("m")
Out[23]:
a b m
0 1 2 March
2 3 4 April
1 5 6 Dec
Note: if a value is not in the list it will be converted to NaN.
An older answer for those interested...
You could create an intermediary series, and set_index
on that:
df = pd.DataFrame([[1, 2, 'March'],[5, 6, 'Dec'],[3, 4, 'April']], columns=['a','b','m'])
s = df['m'].apply(lambda x: {'March':0, 'April':1, 'Dec':3}[x])
s.sort_values()
In [4]: df.set_index(s.index).sort()
Out[4]:
a b m
0 1 2 March
1 3 4 April
2 5 6 Dec
As commented, in newer pandas, Series has a replace
method to do this more elegantly:
s = df['m'].replace({'March':0, 'April':1, 'Dec':3})
The slight difference is that this won't raise if there is a value outside of the dictionary (it'll just stay the same).
Sort pandas dataframe by column specifying custom order
From pandas>=1.1.0
you can use the key
argument of the .sort_values
method to write a lambda function which defines the custom order you prefer.
To do it, you just need to define a custom dictionary with your desired order
custom_dict = {'new': 0, 'fix': 1, 'error': 2}
df.sort_values(by=['col3'], key=lambda x: x.map(custom_dict))
Sort pandas df column by a custom list of values
The solution you mention is a good starting point. You can use ordered=True
with set_categories
to ensure you set categorical ordering as required:
df['Tm'] = df['Tm'].astype('category')
not_in_list = df['Tm'].cat.categories.difference(sorter)
df['Tm'] = df['Tm'].cat.set_categories(np.hstack((sorter, not_in_list)), ordered=True)
df = df.sort_values('Tm')
print(df)
Age G Player Tm Year id
2 22 60 Ratko Varda TOT 2001 13950
0 27 6 Cedric Hunter CHH 1991 2967
3 34 52 Ryan Bowen OKC 2009 6141
1 25 7 Maurice Baker VAN 2004 5335
4 31 81 Adrian Caldwell value_not_present_in_sorter 1997 6169
How to sort a pandas DataFrame on one column given an already ordered list of the values in that column?
Approach 1
Convert the Fruit
column to ordered categorical type and sort the values
df['fruit'] = pd.Categorical(df['fruit'], ordered_list, ordered=True)
df.sort_values('fruit')
Approach 2
Sort the values by passing a key function, which maps the fruit names to there corresponding order
df.sort_values('fruit', key=lambda x: x.map({v:k for k, v in enumerate(ordered_list)}))
id fruit trash
2 3 pineapple 93
1 2 banana 22
3 4 orange 1
4 5 orange 15
0 1 apple 38
How to sort pandas dataframe by custom order on string index
Just reindex
df.reindex(reorderlist)
Out[89]:
Age G Tm Year id
Player
Maurice Baker 25 7 VAN 2004 5335
Adrian Caldwell 31 81 DAL 1997 6169
Ratko Varda 22 60 TOT 2001 13950
Ryan Bowen 34 52 OKC 2009 6141
Cedric Hunter 27 6 CHH 1991 2967
Update info you have multiple players with same name
out = df.iloc[pd.Categorical(df.index,reorderlist).argsort()]
Sorting a pandas DataFrame by the order of a list
You could make the Class
column your index column
df = df.set_index('Class')
and then use df.loc
to reindex the DataFrame with class_list
:
df.loc[class_list]
Minimal example:
>>> df = pd.DataFrame({'Class': ['Gammaproteobacteria', 'Bacteroidetes', 'Negativicutes'], 'Number': [3, 5, 6]})
>>> df
Class Number
0 Gammaproteobacteria 3
1 Bacteroidetes 5
2 Negativicutes 6
>>> df = df.set_index('Class')
>>> df.loc[['Bacteroidetes', 'Negativicutes', 'Gammaproteobacteria']]
Number
Bacteroidetes 5
Negativicutes 6
Gammaproteobacteria 3
Custom sorting with Pandas
One approach would be to use a custom dict to create a 'rank' column, we then use to sort with and then drop the column after sorting:
In [17]:
custom_dict = {'Critical':0, 'High':1, 'Medium':2, 'Low':3}
df['rank'] = df['Criticality'].map(custom_dict)
df
Out[17]:
Name Criticality rank
0 baz High 1
1 foo Critical 0
2 baz Low 3
3 foo Medium 2
4 bar High 1
5 bar Low 3
6 bar Medium 2
[7 rows x 3 columns]
In [19]:
# now sort by 'Name' and 'rank', it will first sort by 'Name' column first and then 'rank'
df.sort(columns=['Name', 'rank'],inplace=True)
df
Out[19]:
Name Criticality rank
4 bar High 1
6 bar Medium 2
5 bar Low 3
0 baz High 1
2 baz Low 3
1 foo Critical 0
3 foo Medium 2
[7 rows x 3 columns]
In [21]:
# now drop the 'rank' column
df.drop(labels=['rank'],axis=1)
Out[21]:
Name Criticality
4 bar High
6 bar Medium
5 bar Low
0 baz High
2 baz Low
1 foo Critical
3 foo Medium
[7 rows x 2 columns]
Pandas df: Sort by index based on custom list
Let's use sort_values
with a custom sort dictionary:
# create dictionary with key as firm, value as order in the list
ordered_dict = {k: i for i, k in enumerate(Ordered_list)}
# sort by index with the created dictionary
df.sort_index(key=lambda x: x.map(ordered_dict))
Related Topics
Why am I Getting Importerror: No Module Named Pip ' Right After Installing Pip
Django Rest Framework Upload Image: "The Submitted Data Was Not a File"
Running a Process in Pythonw with Popen Without a Console
Multiprocessing.Pool: What's the Difference Between Map_Async and Imap
How to Remove Gaps Between Subplots in Matplotlib
Python Requests - How to Use System Ca-Certificates (Debian/Ubuntu)
How to Select Python Version in Pycharm
Return String with First Match for a Regex, Handling Case Where There Is No Match
Scrollbar on Matplotlib Showing Page
How to Get Most Informative Features for Scikit-Learn Classifiers
List of Dicts To/From Dict of Lists
What Is a '"Python"' Layer in Caffe
How to Efficiently Calculate a Running Standard Deviation