pandas dataframe groupby datetime month
Managed to do it:
b = pd.read_csv('b.dat')
b.index = pd.to_datetime(b['date'],format='%m/%d/%y %I:%M%p')
b.groupby(by=[b.index.month, b.index.year])
Or
b.groupby(pd.Grouper(freq='M')) # update for v0.21+
Pandas groupby month and year
You can use either resample or Grouper
(which resamples under the hood).
First make sure that the datetime column is actually of datetimes (hit it with pd.to_datetime
). It's easier if it's a DatetimeIndex:
In [11]: df1
Out[11]:
abc xyz
Date
2013-06-01 100 200
2013-06-03 -20 50
2013-08-15 40 -5
2014-01-20 25 15
2014-02-21 60 80
In [12]: g = df1.groupby(pd.Grouper(freq="M")) # DataFrameGroupBy (grouped by Month)
In [13]: g.sum()
Out[13]:
abc xyz
Date
2013-06-30 80 250
2013-07-31 NaN NaN
2013-08-31 40 -5
2013-09-30 NaN NaN
2013-10-31 NaN NaN
2013-11-30 NaN NaN
2013-12-31 NaN NaN
2014-01-31 25 15
2014-02-28 60 80
In [14]: df1.resample("M", how='sum') # the same
Out[14]:
abc xyz
Date
2013-06-30 40 125
2013-07-31 NaN NaN
2013-08-31 40 -5
2013-09-30 NaN NaN
2013-10-31 NaN NaN
2013-11-30 NaN NaN
2013-12-31 NaN NaN
2014-01-31 25 15
2014-02-28 60 80
Note: Previously pd.Grouper(freq="M")
was written as pd.TimeGrouper("M")
. The latter is now deprecated since 0.21.
I had thought the following would work, but it doesn't (due to as_index
not being respected? I'm not sure.). I'm including this for interest's sake.
If it's a column (it has to be a datetime64 column! as I say, hit it with to_datetime
), you can use the PeriodIndex:
In [21]: df
Out[21]:
Date abc xyz
0 2013-06-01 100 200
1 2013-06-03 -20 50
2 2013-08-15 40 -5
3 2014-01-20 25 15
4 2014-02-21 60 80
In [22]: pd.DatetimeIndex(df.Date).to_period("M") # old way
Out[22]:
<class 'pandas.tseries.period.PeriodIndex'>
[2013-06, ..., 2014-02]
Length: 5, Freq: M
In [23]: per = df.Date.dt.to_period("M") # new way to get the same
In [24]: g = df.groupby(per)
In [25]: g.sum() # dang not quite what we want (doesn't fill in the gaps)
Out[25]:
abc xyz
2013-06 80 250
2013-08 40 -5
2014-01 25 15
2014-02 60 80
To get the desired result we have to reindex...
Pandas groupby month and year (date as datetime64[ns]) and summarized by count
you can groupby
and get the dt.year and the dt.month_name from the column date.
print (df.groupby([df['date'].dt.year.rename('year'),
df['date'].dt.month_name().rename('month')])
['rides'].sum().reset_index())
year month rides
0 2019 January 2596765
1 2020 March 880003
How can I group by month from a date field using Python and Pandas?
Try this:
In [6]: df['date'] = pd.to_datetime(df['date'])
In [7]: df
Out[7]:
date Revenue
0 2017-06-02 100
1 2017-05-23 200
2 2017-05-20 300
3 2017-06-22 400
4 2017-06-21 500
In [59]: df.groupby(df['date'].dt.strftime('%B'))['Revenue'].sum().sort_values()
Out[59]:
date
May 500
June 1000
pandas dataframe - groupby dataframe by datetime (last 12 months) and agreeing two columns, the answer to be like that ↓
Assuming date
column's type is datetime, you can extract months to a different column:
df["month"] = df["date"].dt.month
Then group by month
column and find the averages:
df.groupby("month").agg(wealth_avg=("wealth", "mean"), state_money_avg=("state_money", "mean"))
Convert year-month into Date while GroupBy
Your question wasn't totally clear as didn't have a workable example but I've had a crack at it here for you with data I made up:
import pandas as pd
data = {'period':['202201','202201','202201','202201','202202','202202','202203'], 'actuals':[10,20,30,40,50,60,70]}
df = pd.DataFrame(data)
print("BEFORE:")
This gives period as you described but it's stored as object and not datetime:
BEFORE:
period actuals
0 202201 10
1 202201 20
2 202201 30
3 202201 40
4 202202 50
5 202202 60
6 202203 70
print(df)
Here format='%Y%m'
converts it to datetime (%Y%m means search for YYYYMM in the incoming string). Then .dt.strftime('%Y/%m')
converts it back to an object format type but in the date format you require.
df['period'] = pd.to_datetime(df['period'], format='%Y%m').dt.strftime('%Y/%m')
print("AFTER:")
groupedresults = df.groupby('period')['actuals'].sum()
print(groupedresults)
And here's your output. Change the date format of period to suit your needs:
AFTER:
period
2022/01 100
2022/02 110
2022/03 70
Name: actuals, dtype: int64
How to groupby specifically datetime index in a multiindex column by month
datadf1 = datadf.drop(columns='Unnamed: 0')
prac = datadf1
prac =prac.set_index('ArrDate')
prac_dates = prac.copy()
prac = prac.resample('D').apply({'ShipName':'count','ComoQty':'sum'}).reset_index()
prac_dates = ((prac_dates.resample('M').apply({'ComoQty':'sum'}))/1000).reset_index()
prac_dates['Month'] = pd.DatetimeIndex(prac_dates['ArrDate']).strftime('%B')
del prac_dates['ArrDate']
# prac_dates
prac['Month'] = pd.DatetimeIndex(prac['ArrDate']).strftime('%B')
# prac['Month'] = pd.to_datetime(prac['Month'], format='%B')
prac['ArrDate'] = pd.DatetimeIndex(prac['ArrDate']).strftime('%d')
Related Topics
Accessing Mp3 Metadata with Python
Binary Representation of Float in Python (Bits Not Hex)
How to Make a Selenium Script Undetectable Using Geckodriver and Firefox Through Python
Replicating Rows in a Pandas Data Frame by a Column Value
How to Login to a Website with Python
How to Create Key or Append an Element to Key
Python Requests - No Connection Adapters
Pandas Make New Column from String Slice of Another Column
Scikit-Learn & Statsmodels - Which R-Squared Is Correct
Convert Rgba Png to Rgb with Pil
Jupyter Notebook with Python 3.8 - Notimplementederror
Python: How to Get Stdout After Running Os.System
Pandas Groupby and Select Rows with the Minimum Value in a Specific Column
How to Disable a Pylint Warning
When Would the -E, --Editable Option Be Useful with Pip Install