Pandas groupby month and year
You can use either resample or Grouper
(which resamples under the hood).
First make sure that the datetime column is actually of datetimes (hit it with pd.to_datetime
). It's easier if it's a DatetimeIndex:
In [11]: df1
Out[11]:
abc xyz
Date
2013-06-01 100 200
2013-06-03 -20 50
2013-08-15 40 -5
2014-01-20 25 15
2014-02-21 60 80
In [12]: g = df1.groupby(pd.Grouper(freq="M")) # DataFrameGroupBy (grouped by Month)
In [13]: g.sum()
Out[13]:
abc xyz
Date
2013-06-30 80 250
2013-07-31 NaN NaN
2013-08-31 40 -5
2013-09-30 NaN NaN
2013-10-31 NaN NaN
2013-11-30 NaN NaN
2013-12-31 NaN NaN
2014-01-31 25 15
2014-02-28 60 80
In [14]: df1.resample("M", how='sum') # the same
Out[14]:
abc xyz
Date
2013-06-30 40 125
2013-07-31 NaN NaN
2013-08-31 40 -5
2013-09-30 NaN NaN
2013-10-31 NaN NaN
2013-11-30 NaN NaN
2013-12-31 NaN NaN
2014-01-31 25 15
2014-02-28 60 80
Note: Previously pd.Grouper(freq="M")
was written as pd.TimeGrouper("M")
. The latter is now deprecated since 0.21.
I had thought the following would work, but it doesn't (due to as_index
not being respected? I'm not sure.). I'm including this for interest's sake.
If it's a column (it has to be a datetime64 column! as I say, hit it with to_datetime
), you can use the PeriodIndex:
In [21]: df
Out[21]:
Date abc xyz
0 2013-06-01 100 200
1 2013-06-03 -20 50
2 2013-08-15 40 -5
3 2014-01-20 25 15
4 2014-02-21 60 80
In [22]: pd.DatetimeIndex(df.Date).to_period("M") # old way
Out[22]:
<class 'pandas.tseries.period.PeriodIndex'>
[2013-06, ..., 2014-02]
Length: 5, Freq: M
In [23]: per = df.Date.dt.to_period("M") # new way to get the same
In [24]: g = df.groupby(per)
In [25]: g.sum() # dang not quite what we want (doesn't fill in the gaps)
Out[25]:
abc xyz
2013-06 80 250
2013-08 40 -5
2014-01 25 15
2014-02 60 80
To get the desired result we have to reindex...
Pandas groupby month and year (date as datetime64[ns]) and summarized by count
you can groupby
and get the dt.year and the dt.month_name from the column date.
print (df.groupby([df['date'].dt.year.rename('year'),
df['date'].dt.month_name().rename('month')])
['rides'].sum().reset_index())
year month rides
0 2019 January 2596765
1 2020 March 880003
How to groupby Month and Year and then sum total in Pandas?
You were almost there, but for it to work, you needed first to call pandas' to_datetime()
method twice to generate the years and the months based on the 'Date' and to use 'Name' as an additional argument for the groupbby
call:
totalSum = df.groupby([pd.to_datetime(df['Date']).dt.year,
pd.to_datetime(df['Date']).dt.month,
'Name']).agg({'Price': sum})
totalSum
Out[17]:
Price
Date Date Name
2019 6 A 56
B 120
C 48
2020 3 A 12
B 94
6 B 52
C 87
7 A 10
B 37
C 39
Pandas How to group by month and year using dt
am just wondering how to group by both year and month using pandas.series.dt.
You can pass Series.dt.year
and Series.dt.month
with rename
to groupby
, new columns are not necessary:
print(df.groupby([df['Date'].dt.year.rename('y'), df['Date'].dt.month.rename('m')]).sum())
X Y
y m
1999 2 30 15
10 30 15
2000 7 60 30
8 40 20
2001 9 50 25
Another solutions:
If use DataFrame.resample
or Grouper
then are added all missing datetimes between (what should be nice or not):
print(df.resample('MS', on='Date').sum())
print(df.groupby(pd.Grouper(freq='MS', key='Date')).sum())
Or convert datetimes to month periods by Series.dt.to_period
:
print(df.groupby(df['Date'].dt.to_period('m')).sum())
X Y
Date
1999-02 30 15
1999-10 30 15
2000-07 60 30
2000-08 40 20
2001-09 50 25
pandas dataframe groupby datetime month
Managed to do it:
b = pd.read_csv('b.dat')
b.index = pd.to_datetime(b['date'],format='%m/%d/%y %I:%M%p')
b.groupby(by=[b.index.month, b.index.year])
Or
b.groupby(pd.Grouper(freq='M')) # update for v0.21+
How to group and count rows by month and year using Pandas?
To group on multiple criteria, pass a list of the columns or criteria:
df['birthdate'].groupby([df.birthdate.dt.year, df.birthdate.dt.month]).agg('count')
Example:
In [165]:
df = pd.DataFrame({'birthdate':pd.date_range(start=dt.datetime(2015,12,20),end=dt.datetime(2016,3,1))})
df.groupby([df['birthdate'].dt.year, df['birthdate'].dt.month]).agg({'count'})
Out[165]:
birthdate
count
birthdate birthdate
2015 12 12
2016 1 31
2 29
3 1
UPDATE
As of version 0.23.0
the above code no longer works due to the restriction that multi-index level names must be unique, you now need to rename
the levels in order for this to work:
In[107]:
df.groupby([df['birthdate'].dt.year.rename('year'), df['birthdate'].dt.month.rename('month')]).agg({'count'})
Out[107]:
birthdate
count
year month
2015 12 12
2016 1 31
2 29
3 1
Python Pandas group by month and year
Use GroupBy.transform
with Series.dt.year
and Series.dt.month
:
d = pd.to_datetime(dfx['TIMESTAMP'])
dfx['SUM'] = (dfx.groupby(['NAME',
dfx['TIMESTAMP'].dt.year,
dfx['TIMESTAMP'].dt.month])['VALUE']
.transform('sum'))
Or month period by Series.dt.to_period
:
dfx['SUM'] = (dfx.groupby(['NAME', dfx['TIMESTAMP'].dt.to_period('m')])['VALUE']
.transform('sum'))
print (dfx)
NAME TIMESTAMP VALUE SUM
0 AAA 2019-01-01 10 30
1 AAA 2019-01-02 20 30
2 AAA 2019-02-01 30 70
3 AAA 2019-02-02 40 70
4 BBB 2019-01-01 50 110
5 BBB 2019-01-02 60 110
6 BBB 2019-02-01 70 150
7 BBB 2019-02-02 80 150
Related Topics
How to Profile Python Code Line-By-Line
Python Nested Functions Variable Scoping
Get Fully Qualified Class Name of an Object in Python
Want to Find Contours -> Valueerror: Not Enough Values to Unpack (Expected 3, Got 2), This Appears
How to Install 2 Anacondas (Python 2 and 3) on MAC Os
Python: Converting from Iso-8859-1/Latin1 to Utf-8
Excluding Directories in Os.Walk
How Would I Stop a While Loop After N Amount of Time
How to Sort a List of Tuples According to Another List
Split a Python List into Other "Sublists" I.E Smaller Lists
Numpy Index Slice Without Losing Dimension Information
Python Subprocess.Call a Bash Alias
Save Classifier to Disk in Scikit-Learn