how to sort pandas dataframe from one column
Use sort_values
to sort the df by a specific column's values:
In [18]:
df.sort_values('2')
Out[18]:
0 1 2
4 85.6 January 1.0
3 95.5 February 2.0
7 104.8 March 3.0
0 354.7 April 4.0
8 283.5 May 5.0
6 238.7 June 6.0
5 152.0 July 7.0
1 55.4 August 8.0
11 212.7 September 9.0
10 249.6 October 10.0
9 278.8 November 11.0
2 176.5 December 12.0
If you want to sort by two columns, pass a list of column labels to sort_values
with the column labels ordered according to sort priority. If you use df.sort_values(['2', '0'])
, the result would be sorted by column 2
then column 0
. Granted, this does not really make sense for this example because each value in df['2']
is unique.
Sorting columns in pandas dataframe based on column name
df = df.reindex(sorted(df.columns), axis=1)
This assumes that sorting the column names will give the order you want. If your column names won't sort lexicographically (e.g., if you want column Q10.3 to appear after Q9.1), you'll need to sort differently, but that has nothing to do with pandas.
How to sort a pandas DataFrame on one column given an already ordered list of the values in that column?
Approach 1
Convert the Fruit
column to ordered categorical type and sort the values
df['fruit'] = pd.Categorical(df['fruit'], ordered_list, ordered=True)
df.sort_values('fruit')
Approach 2
Sort the values by passing a key function, which maps the fruit names to there corresponding order
df.sort_values('fruit', key=lambda x: x.map({v:k for k, v in enumerate(ordered_list)}))
id fruit trash
2 3 pineapple 93
1 2 banana 22
3 4 orange 1
4 5 orange 15
0 1 apple 38
Custom sorting in pandas dataframe
Pandas 0.15 introduced Categorical Series, which allows a much clearer way to do this:
First make the month column a categorical and specify the ordering to use.
In [21]: df['m'] = pd.Categorical(df['m'], ["March", "April", "Dec"])
In [22]: df # looks the same!
Out[22]:
a b m
0 1 2 March
1 5 6 Dec
2 3 4 April
Now, when you sort the month column it will sort with respect to that list:
In [23]: df.sort_values("m")
Out[23]:
a b m
0 1 2 March
2 3 4 April
1 5 6 Dec
Note: if a value is not in the list it will be converted to NaN.
An older answer for those interested...
You could create an intermediary series, and set_index
on that:
df = pd.DataFrame([[1, 2, 'March'],[5, 6, 'Dec'],[3, 4, 'April']], columns=['a','b','m'])
s = df['m'].apply(lambda x: {'March':0, 'April':1, 'Dec':3}[x])
s.sort_values()
In [4]: df.set_index(s.index).sort()
Out[4]:
a b m
0 1 2 March
1 3 4 April
2 5 6 Dec
As commented, in newer pandas, Series has a replace
method to do this more elegantly:
s = df['m'].replace({'March':0, 'April':1, 'Dec':3})
The slight difference is that this won't raise if there is a value outside of the dictionary (it'll just stay the same).
Pandas: Sort a dataframe based on multiple columns
You can swap columns in list and also values in ascending
parameter:
Explanation:
Order of columns names is order of sorting, first sort descending by Employee_Count
and if some duplicates in Employee_Count
then sorting by Department
only duplicates rows ascending.
df1 = df.sort_values(['Employee_Count', 'Department'], ascending=[False, True])
print (df1)
Department Employee_Count
4 xyz 15
2 bca 11
0 abc 10 <-
1 adc 10 <-
3 cde 9
Or for test if use second False
then duplicated rows are sorting descending
:
df2 = df.sort_values(['Employee_Count', 'Department',],ascending=[False, False])
print (df2)
Department Employee_Count
4 xyz 15
2 bca 11
1 adc 10 <-
0 abc 10 <-
3 cde 9
pandas - Sorting Columns in a custom order
First idea is use list comprehension and join lists by +
:
expected_columns = ['cust_id','cost_id','sale_id','prod_id']
df = pd.DataFrame(columns=['customer_name','cust_id','sale_id','sale_time'])
expected_columns = ['cust_id','cost_id','sale_id','prod_id']
new1 = [c for c in df.columns if c in expected_columns]
new2 = [c for c in df.columns if c not in expected_columns]
new = new1 + new2
print (new)
['cust_id','sale_id','customer_name','sale_time']
Or using Index.intersection
with Index.difference
:
expected_columns = ['cust_id','cost_id','sale_id','prod_id']
new = (df.columns.intersection(expected_columns, sort=False).tolist() +
df.columns.difference(expected_columns, sort=False).tolist())
If also ordering in ouput by expected_columns
is important use:
new = (pd.Index(expected_columns).intersection(df.columns, sort=False).tolist() +
df.columns.difference(expected_columns, sort=False).tolist())
Difference is changed sample data:
expected_columns = ['sale_id','cost_id','cust_id','prod_id']
df = pd.DataFrame(columns=['customer_name','cust_id','sale_id','sale_time'])
new = (pd.Index(expected_columns).intersection(df.columns, sort=False).tolist() +
df.columns.difference(expected_columns, sort=False).tolist())
print (new)
['sale_id', 'cust_id', 'customer_name', 'sale_time']
new = (df.columns.intersection(expected_columns, sort=False).tolist() +
df.columns.difference(expected_columns, sort=False).tolist())
print (new)
['cust_id', 'sale_id', 'customer_name', 'sale_time']
Last for change order of columns use:
df = df[new]
How to sort a Pandas DataFrame with multiple columns, some in ascending order and others descending order?
The DataFrame.sort_values
method can handle this very easily. Just use the ascending
argument and provide a list of boolean values.
import pandas as pd
my_df = pd.DataFrame({'col1':['a','a','a','a','b','b','b','b','c','c','c','c'],
'col2':[1,1,2,2,1,1,2,2,1,1,2,2],
'col3':[1,2,1,2,1,2,1,2,1,2,1,2]})
my_df = my_df.sort_values(by=['col1','col2','col3'],
ascending=[False, False, True])
Note that the list provided in the ascending
argument must have the same length as the one provided in the by
argument.
Related Topics
How to Plot Grid Plots on a Same Page
Quantiles by Factor Levels in R
How to Use Stat_Function by Group
Calculate a 2D Spline Curve in R
R Find Overlap Among Time Periods
How to Find All Possible Subsets of a Set Iteratively in R
R Bookdown - Custom Title Page
Error: C Stack Usage Is Too Close to The Limit in R
How Does R Represent Na Internally
What's The Difference Between [1], [1,], [,1], [[1]] for a Dataframe in R
Combining Two Vectors Element-By-Element
Specific Spaces Between Bars in a Barplot - Ggplot2 - R
Large Matrices in Rcpparmadillo via The Arma_64Bit_Word Define
Converting a Long-Formated Dataframe to Wide Format Tidyverse
Means from a List of Data Frames in R
Evaluate Different Logical Conditions from String for Each Row