﻿ Calculating the Mean of Each Month by Year in Python - ITCodar

# Calculating the Mean of Each Month by Year in Python

## Calculating the mean of each month by year in Python

IIUC, you can use `pd.Grouper`. I took the liberty of adding a few rows to your dataframe (with different months) to show:

``>>> df              ds       y1256  2000-01-03  1.80501257  2000-01-04  1.84051258  2000-01-05  1.85601259  2000-01-06  1.84001260  2000-01-07  1.83101261  2000-01-10  1.81901262  2000-01-11  1.82251263  2000-01-12  1.83501263  2000-02-12  1.83501263  2000-02-15  2.94505844  2018-04-09  3.39505845  2018-04-10  3.41465846  2018-04-11  3.39555847  2018-04-12  3.39025848  2018-04-13  3.40885849  2018-04-16  3.42825850  2018-04-17  3.40225851  2018-04-18  3.38445852  2018-04-19  3.40285853  2018-04-20  3.41215854  2018-04-23  3.44635855  2018-04-24  3.46855856  2018-04-25  3.50905857  2018-04-26  3.4992# first cast ds to datetimedf['ds'] = pd.to_datetime(df['ds'])# then group by month, and get the mean:df.groupby(pd.Grouper(key='ds', freq='M')).mean().dropna()                       y    ds                      2000-01-31  1.831125    2000-02-29  2.390000    2018-04-30  3.425486``

The resulting Series shows the mean value of `y` for each month, showing the date of the final day of that month.

## Python - Aggregate by month and calculate average

Probably the simplest approach is to use the `resample` command. First, when you read in your data make sure you parse the dates and set the date column as your index (ignore the `StringIO` part and the header=True ... I am reading in your sample data from a multi-line string):

``>>> df = pd.read_csv(StringIO(data),header=True,parse_dates=['Date'],                     index_col='Date')>>> df            SentimentDate2014-01-03       0.402014-01-04      -0.032014-01-09       0.002014-01-10       0.072014-01-12       0.002014-02-24       0.00 2014-02-25       0.002014-02-25       0.002014-02-26       0.002014-02-28       0.002014-03-01       0.102014-03-02      -0.502014-03-03       0.002014-03-08      -0.062014-03-11      -0.132014-03-22       0.002014-03-23       0.332014-03-23       0.302014-03-25      -0.142014-03-28      -0.25>>> df.resample('M').mean()            Sentiment2014-01-31      0.0882014-02-28      0.0002014-03-31     -0.035``

And if you want a month counter, you can add it after your `resample`:

``>>> agg = df.resample('M',how='mean')>>> agg['cnt'] = range(len(agg))>>> agg            Sentiment  cnt2014-01-31      0.088    02014-02-28      0.000    12014-03-31     -0.035    2``

You can also do this with the `groupby` method and the `TimeGrouper` function (group by month and then call the mean convenience method that is available with `groupby`).

``>>> df.groupby(pd.TimeGrouper(freq='M')).mean()            Sentiment2014-01-31      0.0882014-02-28      0.0002014-03-31     -0.035``

## Pandas, how to calculate mean values of the past n years for every month

I could not guess what were the columns and indexes in your dataframe. So assuming that it is:

``df = pd.DataFrame({'year': [1999.0, 1999.0, 1999.0, 2000.0, 2000.0, 2000.0,                            2001.0, 2001.0, 2001.0, 2002.0, 2002.0, 2002.0,                            2003.0, 2003.0, 2003.0],                   'Month': ['1', '2', '3', '1', '2', '3', '1', '2', '3',                             '1', '2', '3', '1', '2', '3'],                   'value': ['6', '9', '7', '5', '7', '6', '4', '6', '8',                             '7', '9', '8', '5', '7', '7']})``

giving:

``0   year Month value1   1999     1     62   1999     2     93   1999     3     74   2000     1     55   2000     2     76   2000     3     67   2001     1     48   2001     2     69   2001     3     810  2002     1     711  2002     2     912  2002     3     813  2003     1     514  2003     2     715  2003     3     7``

You can group by month and use a rolling windows of size 3 to compute the rolling sum of the last 3 years per month, and shift the result to align it:

``df['average_past_3_years'] = df.groupby('Month').rolling(3).agg(                      {'value':'mean', 'year': 'max'}).reset_index(level=0).groupby(                      'Month').transform('shift')['value']``

It will give as expected:

``0   year Month value  average_past_3_years1   1999     1     6                   NaN2   1999     2     9                   NaN3   1999     3     7                   NaN4   2000     1     5                   NaN5   2000     2     7                   NaN6   2000     3     6                   NaN7   2001     1     4                   NaN8   2001     2     6                   NaN9   2001     3     8                   NaN10  2002     1     7              5.00000011  2002     2     9              7.33333312  2002     3     8              7.00000013  2003     1     5              5.33333314  2003     2     7              7.33333315  2003     3     7              7.333333``

## Get monthly average in pandas

We can convert your datetime column into a `PeriodIndex` on monthly frequency, then take the mean using `GroupBy.mean`:

``df.groupby(pd.PeriodIndex(df['Date'], freq="M"))['Value'].mean()    Date2006-01    14.62019-12    38.2Freq: M, Name: Value, dtype: float64``

``df.groupby(pd.PeriodIndex(df['Date'], freq="M"))['Value'].mean().reset_index()      Date  Value0  2006-01   14.61  2019-12   38.2``

One caveat of this approach is that missing months are not shown. If that's important, use `set_index` and `resample.mean` in the same way.

## How do I calculate mean value for each month in the dataset?

Try:

``df.index = pd.to_datetime(df.index)df.groupby([df.index.year, df.index.month]).mean()             RPT        VAL        ROS  ...        CLO        BEL     MALDATE DATE                                   ...                              1961 1     12.373333   9.333333  11.043333  ...   7.906667   8.833333  11.960     2     12.230000  12.020000   8.560000  ...   9.210000  15.290000  15.125     3     10.580000   6.630000  11.750000  ...   5.880000   5.460000  10.8801962 3     13.330000  13.250000  11.420000  ...  10.340000  12.920000  11.830     6     13.210000   8.120000   9.960000  ...   7.500000   8.120000  13.1701968 7     12.230000  12.020000   8.560000  ...   9.210000  15.290000  15.1251976 8     11.955000   9.940000  11.585000  ...   8.110000   9.190000  11.3551978 9     13.355000  11.205000   9.730000  ...   7.730000  11.040000  13.480     12    10.960000   9.750000   7.620000  ...  10.460000  16.620000  16.460``

## How to group and calculate monthly average in pandas dataframe

Convert values to datetimes first, then aggregate `sum` per `name` and months by `Grouper` and last get `mean` per first level `name`:

``data['time'] = pd.to_datetime(data['time'])df = (data.groupby(['name', pd.Grouper(freq='m', key='time')])['values'].sum()          .groupby(level=0)          .mean()          .reset_index(name='Monthly Average'))print (df)  name  Monthly Average0    A               251    B               30``

With months period solution is if change `Grouper` to `Series.dt.to_period`:

``data['time'] = pd.to_datetime(data['time'])df = (data.groupby(['name', data['time'].dt.to_period('m')])['values']          .sum()          .groupby(level=0)          .mean()          .reset_index(name='Monthly Average'))print (df)  name  Monthly Average0    A               251    B               30``

## How to calculate the Monthly Average over Multiple Years with multiple Latitude and Longitude - Pandas - Xarray

If I understand, you're after the long-term mean for each month. If so, you can use xarray with `groupby()` instead of `resample()` to calculate these climatologies.

``climatology = Multidata.groupby("time.month").mean("time")``

See xarray docs here calculating monthly anomalies.