Sort a pandas dataframe series by month name
Thanks @Brad Solomon for offering a faster way to capitalize string!
Note 1 @Brad Solomon's answer using pd.categorical
should save your resources more than my answer. He showed how to assign order to your categorical data. You should not miss it :P
Alternatively, you can use.
df = pd.DataFrame([["dec", 12], ["jan", 40], ["mar", 11], ["aug", 21],
["aug", 11], ["jan", 11], ["jan", 1]],
columns=["Month", "Price"])
# Preprocessing: capitalize `jan`, `dec` to `Jan` and `Dec`
df["Month"] = df["Month"].str.capitalize()
# Now the dataset should look like
# Month Price
# -----------
# Dec XX
# Jan XX
# Apr XX
# make it a datetime so that we can sort it:
# use %b because the data use the abbreviation of month
df["Month"] = pd.to_datetime(df.Month, format='%b', errors='coerce').dt.month
df = df.sort_values(by="Month")
total = (df.groupby(df['Month'])['Price'].mean())
# total
Month
1 17.333333
3 11.000000
8 16.000000
12 12.000000
Note 2groupby
by default will sort group keys for you. Be aware to use the same key to sort and groupby in the df = df.sort_values(by=SAME_KEY)
and total = (df.groupby(df[SAME_KEY])['Price'].mean()).
Otherwise, one may gets unintended behavior. See Groupby preserve order among groups? In which way? for more information.
Note 3
A more computationally efficient way is first compute mean and then do sorting on months. In this way, you only need to sort on 12 items rather than the whole df
. It will reduce the computational cost if one don't need df
to be sorted.
Note 4 For people already have month
as index, and wonder how to make it categorical, take a look at pandas.CategoricalIndex
@jezrael has a working example on making categorical index ordered in Pandas series sort by month index
Sort a pandas's dataframe series by month and year?
You need to change your month name to month number, for example Jan 2013 to 01 2013.
Then sort it, and then change it again to month name-year.
df['date value'] = pd.to_datetime(df['date value'], format='%b%Y')
df = df.sort_values('date value', ascending = True)
Sort groupby pandas output by Month name and year
EDIT: Your solution should be changed:
df1 = df.groupby(["Year", "Month Name"], as_index=False)["Days"].agg(['min', 'mean'])
df3 = df.groupby(["Year", "Month Name"], as_index=False)["Data"].agg(['count'])
merged_df=pd.merge(df3, df1, on=['Year','Month Name']).reset_index()
cats = ['Jan', 'Feb', 'Mar', 'Apr','May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
merged_df['Month Name'] = pd.Categorical(merged_df['Month Name'],categories=cats, ordered=True)
merged_df = merged_df.sort_values(["Year", "Month Name"])
print (merged_df)
Year Month Name count min mean
1 2014 Jan 1 2 2
0 2014 Dec 1 1 1
2 2015 Aug 1 1 1
3 2016 Apr 1 4 4
Or:
df1 = (df.groupby(["Year", "Month Name"])
.agg(min_days=("Days", 'min'),
avg_days=("Days", 'mean'),
count = ('Data', 'count'))
.reset_index())
cats = ['Jan', 'Feb', 'Mar', 'Apr','May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
df1['Month Name'] = pd.Categorical(df1['Month Name'], categories=cats, ordered=True)
df1 = df1.sort_values(["Year", "Month Name"])
print (df1)
Year Month Name min_days avg_days count
1 2014 Jan 2 2 1
0 2014 Dec 1 1 1
2 2015 Aug 1 1 1
3 2016 Apr 4 4 1
Last solution with MultiIndex
and no categoricals, solution create helper dates column and sorting by it:
df1 = (df.groupby(["Year", "Month Name"])
.agg(min_days=("Days", 'min'),
avg_days=("Days", 'mean'),
count = ('Data', 'count'))
)
df1['dates'] = pd.to_datetime([f'{y}{m}' for y, m in df1.index], format='%Y%b')
df1 = df1.sort_values('dates')
print (df1)
min_days avg_days count dates
Year Month Name
2014 Jan 2 2 1 2014-01-01
Dec 1 1 1 2014-12-01
2015 Aug 1 1 1 2015-08-01
2016 Apr 4 4 1 2016-04-01
How to sort pandas dataframe by month name
Always try to post your code.
In this way we could figure out why your categorical sorting did not work out. But I suspect you forgot the ordered=True
parameter.
Categorical ordering allows sorting according to a custom order, and works perfectly for this case. It also handles well duplicated month values. Here is my code:
df["month"] = pd.Categorical(df["month"],
categories=["January", "February", "March", "April", "May", "June", "July",
"August", "September", "October", "November", "December"],
ordered=True)
And after that we can call the function sort_values()
:
df = df.sort_values(["year", "month"], ignore_index=True)
Cheers and keep it up!
Pandas Dataframe month int to month name in order
You can use pandas.Categorical
with parameter ordered=True
. You can define any order you want using categories
argument.
months_order = ["Jan", "Feb", "Mar", "Apr"]
cat = pd.Categorical(["Mar", "Feb", "Apr", "Jan"],
categories=months_order, ordered=True)
Printing cat
will give
[Mar, Feb, Jan, Apr]
Categories (4, object): [Jan < Feb < Mar < Apr]
And printing cat.sort_values()
will give
[Jan, Feb, Mar, Apr]
Categories (4, object): [Jan < Feb < Mar < Apr]
EDIT: In your case, you can replace groupby
argument
order_group_df['Reported on'].dt.month.apply(mapper)
by
pd.Categorical(order_group_df['Reported on'].dt.month.apply(mapper),
categories=['Jan', ..., 'Dec'],
ordered=True)
sort months in pandas DataFrame
You'll need to grab a sorted list of the month names and reorder your dataframe based on that. Thankfully python has a built-in list of chronological months names in the calendar
library:
import calendar
all_months = calendar.month_name[1:]
df_pivot = df_pivot.reindex(columns=all_months)
This will also create null columns for months that are not present in your data. If you do not want the null columns you can use dropna
afterwards.
Related Topics
Does a Slicing Operation Give Me a Deep or Shallow Copy
How to Write String Literals in Python Without Having to Escape Them
Why Does Pyplot.Contour() Require Z to Be a 2D Array
Installing Scipy in Python 3.5 on 32-Bit Windows 7 MAChine
Replace Column Values in One Dataframe by Values of Another Dataframe
Use Groupby in Pandas to Count Things in One Column in Comparison to Another
Return a Download and Rendered Page in One Flask Response
Detecting Mouse Clicks in Windows Using Python
Scraping Ajax Pages Using Python
Set Environment Variable in Python Script
Return and Yield in the Same Function
Skip Multiple Iterations in Loop
How to Give Column Name Dynamically from String Variable in SQL Alchemy Filter
List' Object Has No Attribute 'Get_Attribute' While Iterating Through Webelements