Calculating the Mean of Each Month by Year in Python

Calculating the mean of each month by year in Python

IIUC, you can use pd.Grouper. I took the liberty of adding a few rows to your dataframe (with different months) to show:

>>> df
ds y
1256 2000-01-03 1.8050
1257 2000-01-04 1.8405
1258 2000-01-05 1.8560
1259 2000-01-06 1.8400
1260 2000-01-07 1.8310
1261 2000-01-10 1.8190
1262 2000-01-11 1.8225
1263 2000-01-12 1.8350
1263 2000-02-12 1.8350
1263 2000-02-15 2.9450
5844 2018-04-09 3.3950
5845 2018-04-10 3.4146
5846 2018-04-11 3.3955
5847 2018-04-12 3.3902
5848 2018-04-13 3.4088
5849 2018-04-16 3.4282
5850 2018-04-17 3.4022
5851 2018-04-18 3.3844
5852 2018-04-19 3.4028
5853 2018-04-20 3.4121
5854 2018-04-23 3.4463
5855 2018-04-24 3.4685
5856 2018-04-25 3.5090
5857 2018-04-26 3.4992

# first cast ds to datetime
df['ds'] = pd.to_datetime(df['ds'])
# then group by month, and get the mean:
df.groupby(pd.Grouper(key='ds', freq='M')).mean().dropna()

y
ds
2000-01-31 1.831125
2000-02-29 2.390000
2018-04-30 3.425486

The resulting Series shows the mean value of y for each month, showing the date of the final day of that month.

Python - Aggregate by month and calculate average

Probably the simplest approach is to use the resample command. First, when you read in your data make sure you parse the dates and set the date column as your index (ignore the StringIO part and the header=True ... I am reading in your sample data from a multi-line string):

>>> df = pd.read_csv(StringIO(data),header=True,parse_dates=['Date'],
index_col='Date')
>>> df

Sentiment
Date
2014-01-03 0.40
2014-01-04 -0.03
2014-01-09 0.00
2014-01-10 0.07
2014-01-12 0.00
2014-02-24 0.00
2014-02-25 0.00
2014-02-25 0.00
2014-02-26 0.00
2014-02-28 0.00
2014-03-01 0.10
2014-03-02 -0.50
2014-03-03 0.00
2014-03-08 -0.06
2014-03-11 -0.13
2014-03-22 0.00
2014-03-23 0.33
2014-03-23 0.30
2014-03-25 -0.14
2014-03-28 -0.25


>>> df.resample('M').mean()

Sentiment
2014-01-31 0.088
2014-02-28 0.000
2014-03-31 -0.035

And if you want a month counter, you can add it after your resample:

>>> agg = df.resample('M',how='mean')
>>> agg['cnt'] = range(len(agg))
>>> agg

Sentiment cnt
2014-01-31 0.088 0
2014-02-28 0.000 1
2014-03-31 -0.035 2

You can also do this with the groupby method and the TimeGrouper function (group by month and then call the mean convenience method that is available with groupby).

>>> df.groupby(pd.TimeGrouper(freq='M')).mean()

Sentiment
2014-01-31 0.088
2014-02-28 0.000
2014-03-31 -0.035

Pandas, how to calculate mean values of the past n years for every month

I could not guess what were the columns and indexes in your dataframe. So assuming that it is:

df = pd.DataFrame({'year': [1999.0, 1999.0, 1999.0, 2000.0, 2000.0, 2000.0,
2001.0, 2001.0, 2001.0, 2002.0, 2002.0, 2002.0,
2003.0, 2003.0, 2003.0],
'Month': ['1', '2', '3', '1', '2', '3', '1', '2', '3',
'1', '2', '3', '1', '2', '3'],
'value': ['6', '9', '7', '5', '7', '6', '4', '6', '8',
'7', '9', '8', '5', '7', '7']})

giving:

0   year Month value
1 1999 1 6
2 1999 2 9
3 1999 3 7
4 2000 1 5
5 2000 2 7
6 2000 3 6
7 2001 1 4
8 2001 2 6
9 2001 3 8
10 2002 1 7
11 2002 2 9
12 2002 3 8
13 2003 1 5
14 2003 2 7
15 2003 3 7

You can group by month and use a rolling windows of size 3 to compute the rolling sum of the last 3 years per month, and shift the result to align it:

df['average_past_3_years'] = df.groupby('Month').rolling(3).agg(
{'value':'mean', 'year': 'max'}).reset_index(level=0).groupby(
'Month').transform('shift')['value']

It will give as expected:

0   year Month value  average_past_3_years
1 1999 1 6 NaN
2 1999 2 9 NaN
3 1999 3 7 NaN
4 2000 1 5 NaN
5 2000 2 7 NaN
6 2000 3 6 NaN
7 2001 1 4 NaN
8 2001 2 6 NaN
9 2001 3 8 NaN
10 2002 1 7 5.000000
11 2002 2 9 7.333333
12 2002 3 8 7.000000
13 2003 1 5 5.333333
14 2003 2 7 7.333333
15 2003 3 7 7.333333

Get monthly average in pandas

We can convert your datetime column into a PeriodIndex on monthly frequency, then take the mean using GroupBy.mean:

df.groupby(pd.PeriodIndex(df['Date'], freq="M"))['Value'].mean()

Date
2006-01 14.6
2019-12 38.2
Freq: M, Name: Value, dtype: float64


df.groupby(pd.PeriodIndex(df['Date'], freq="M"))['Value'].mean().reset_index()

Date Value
0 2006-01 14.6
1 2019-12 38.2

One caveat of this approach is that missing months are not shown. If that's important, use set_index and resample.mean in the same way.

How do I calculate mean value for each month in the dataset?

Try:

df.index = pd.to_datetime(df.index)
df.groupby([df.index.year, df.index.month]).mean()

RPT VAL ROS ... CLO BEL MAL
DATE DATE ...
1961 1 12.373333 9.333333 11.043333 ... 7.906667 8.833333 11.960
2 12.230000 12.020000 8.560000 ... 9.210000 15.290000 15.125
3 10.580000 6.630000 11.750000 ... 5.880000 5.460000 10.880
1962 3 13.330000 13.250000 11.420000 ... 10.340000 12.920000 11.830
6 13.210000 8.120000 9.960000 ... 7.500000 8.120000 13.170
1968 7 12.230000 12.020000 8.560000 ... 9.210000 15.290000 15.125
1976 8 11.955000 9.940000 11.585000 ... 8.110000 9.190000 11.355
1978 9 13.355000 11.205000 9.730000 ... 7.730000 11.040000 13.480
12 10.960000 9.750000 7.620000 ... 10.460000 16.620000 16.460

How to group and calculate monthly average in pandas dataframe

Convert values to datetimes first, then aggregate sum per name and months by Grouper and last get mean per first level name:

data['time'] = pd.to_datetime(data['time'])

df = (data.groupby(['name', pd.Grouper(freq='m', key='time')])['values'].sum()
.groupby(level=0)
.mean()
.reset_index(name='Monthly Average'))
print (df)
name Monthly Average
0 A 25
1 B 30

With months period solution is if change Grouper to Series.dt.to_period:

data['time'] = pd.to_datetime(data['time'])

df = (data.groupby(['name', data['time'].dt.to_period('m')])['values']
.sum()
.groupby(level=0)
.mean()
.reset_index(name='Monthly Average'))
print (df)
name Monthly Average
0 A 25
1 B 30

How to calculate the Monthly Average over Multiple Years with multiple Latitude and Longitude - Pandas - Xarray

If I understand, you're after the long-term mean for each month. If so, you can use xarray with groupby() instead of resample() to calculate these climatologies.

climatology = Multidata.groupby("time.month").mean("time")

See xarray docs here calculating monthly anomalies.



Related Topics



Leave a reply



Submit