Calculating the mean of each month by year in Python
IIUC, you can use pd.Grouper
. I took the liberty of adding a few rows to your dataframe (with different months) to show:
>>> df
ds y
1256 2000-01-03 1.8050
1257 2000-01-04 1.8405
1258 2000-01-05 1.8560
1259 2000-01-06 1.8400
1260 2000-01-07 1.8310
1261 2000-01-10 1.8190
1262 2000-01-11 1.8225
1263 2000-01-12 1.8350
1263 2000-02-12 1.8350
1263 2000-02-15 2.9450
5844 2018-04-09 3.3950
5845 2018-04-10 3.4146
5846 2018-04-11 3.3955
5847 2018-04-12 3.3902
5848 2018-04-13 3.4088
5849 2018-04-16 3.4282
5850 2018-04-17 3.4022
5851 2018-04-18 3.3844
5852 2018-04-19 3.4028
5853 2018-04-20 3.4121
5854 2018-04-23 3.4463
5855 2018-04-24 3.4685
5856 2018-04-25 3.5090
5857 2018-04-26 3.4992
# first cast ds to datetime
df['ds'] = pd.to_datetime(df['ds'])
# then group by month, and get the mean:
df.groupby(pd.Grouper(key='ds', freq='M')).mean().dropna()
y
ds
2000-01-31 1.831125
2000-02-29 2.390000
2018-04-30 3.425486
The resulting Series shows the mean value of y
for each month, showing the date of the final day of that month.
Python - Aggregate by month and calculate average
Probably the simplest approach is to use the resample
command. First, when you read in your data make sure you parse the dates and set the date column as your index (ignore the StringIO
part and the header=True ... I am reading in your sample data from a multi-line string):
>>> df = pd.read_csv(StringIO(data),header=True,parse_dates=['Date'],
index_col='Date')
>>> df
Sentiment
Date
2014-01-03 0.40
2014-01-04 -0.03
2014-01-09 0.00
2014-01-10 0.07
2014-01-12 0.00
2014-02-24 0.00
2014-02-25 0.00
2014-02-25 0.00
2014-02-26 0.00
2014-02-28 0.00
2014-03-01 0.10
2014-03-02 -0.50
2014-03-03 0.00
2014-03-08 -0.06
2014-03-11 -0.13
2014-03-22 0.00
2014-03-23 0.33
2014-03-23 0.30
2014-03-25 -0.14
2014-03-28 -0.25
>>> df.resample('M').mean()
Sentiment
2014-01-31 0.088
2014-02-28 0.000
2014-03-31 -0.035
And if you want a month counter, you can add it after your resample
:
>>> agg = df.resample('M',how='mean')
>>> agg['cnt'] = range(len(agg))
>>> agg
Sentiment cnt
2014-01-31 0.088 0
2014-02-28 0.000 1
2014-03-31 -0.035 2
You can also do this with the groupby
method and the TimeGrouper
function (group by month and then call the mean convenience method that is available with groupby
).
>>> df.groupby(pd.TimeGrouper(freq='M')).mean()
Sentiment
2014-01-31 0.088
2014-02-28 0.000
2014-03-31 -0.035
Pandas, how to calculate mean values of the past n years for every month
I could not guess what were the columns and indexes in your dataframe. So assuming that it is:
df = pd.DataFrame({'year': [1999.0, 1999.0, 1999.0, 2000.0, 2000.0, 2000.0,
2001.0, 2001.0, 2001.0, 2002.0, 2002.0, 2002.0,
2003.0, 2003.0, 2003.0],
'Month': ['1', '2', '3', '1', '2', '3', '1', '2', '3',
'1', '2', '3', '1', '2', '3'],
'value': ['6', '9', '7', '5', '7', '6', '4', '6', '8',
'7', '9', '8', '5', '7', '7']})
giving:
0 year Month value
1 1999 1 6
2 1999 2 9
3 1999 3 7
4 2000 1 5
5 2000 2 7
6 2000 3 6
7 2001 1 4
8 2001 2 6
9 2001 3 8
10 2002 1 7
11 2002 2 9
12 2002 3 8
13 2003 1 5
14 2003 2 7
15 2003 3 7
You can group by month and use a rolling windows of size 3 to compute the rolling sum of the last 3 years per month, and shift the result to align it:
df['average_past_3_years'] = df.groupby('Month').rolling(3).agg(
{'value':'mean', 'year': 'max'}).reset_index(level=0).groupby(
'Month').transform('shift')['value']
It will give as expected:
0 year Month value average_past_3_years
1 1999 1 6 NaN
2 1999 2 9 NaN
3 1999 3 7 NaN
4 2000 1 5 NaN
5 2000 2 7 NaN
6 2000 3 6 NaN
7 2001 1 4 NaN
8 2001 2 6 NaN
9 2001 3 8 NaN
10 2002 1 7 5.000000
11 2002 2 9 7.333333
12 2002 3 8 7.000000
13 2003 1 5 5.333333
14 2003 2 7 7.333333
15 2003 3 7 7.333333
Get monthly average in pandas
We can convert your datetime column into a PeriodIndex
on monthly frequency, then take the mean using GroupBy.mean
:
df.groupby(pd.PeriodIndex(df['Date'], freq="M"))['Value'].mean()
Date
2006-01 14.6
2019-12 38.2
Freq: M, Name: Value, dtype: float64
df.groupby(pd.PeriodIndex(df['Date'], freq="M"))['Value'].mean().reset_index()
Date Value
0 2006-01 14.6
1 2019-12 38.2
One caveat of this approach is that missing months are not shown. If that's important, use set_index
and resample.mean
in the same way.
How do I calculate mean value for each month in the dataset?
Try:
df.index = pd.to_datetime(df.index)
df.groupby([df.index.year, df.index.month]).mean()
RPT VAL ROS ... CLO BEL MAL
DATE DATE ...
1961 1 12.373333 9.333333 11.043333 ... 7.906667 8.833333 11.960
2 12.230000 12.020000 8.560000 ... 9.210000 15.290000 15.125
3 10.580000 6.630000 11.750000 ... 5.880000 5.460000 10.880
1962 3 13.330000 13.250000 11.420000 ... 10.340000 12.920000 11.830
6 13.210000 8.120000 9.960000 ... 7.500000 8.120000 13.170
1968 7 12.230000 12.020000 8.560000 ... 9.210000 15.290000 15.125
1976 8 11.955000 9.940000 11.585000 ... 8.110000 9.190000 11.355
1978 9 13.355000 11.205000 9.730000 ... 7.730000 11.040000 13.480
12 10.960000 9.750000 7.620000 ... 10.460000 16.620000 16.460
How to group and calculate monthly average in pandas dataframe
Convert values to datetimes first, then aggregate sum
per name
and months by Grouper
and last get mean
per first level name
:
data['time'] = pd.to_datetime(data['time'])
df = (data.groupby(['name', pd.Grouper(freq='m', key='time')])['values'].sum()
.groupby(level=0)
.mean()
.reset_index(name='Monthly Average'))
print (df)
name Monthly Average
0 A 25
1 B 30
With months period solution is if change Grouper
to Series.dt.to_period
:
data['time'] = pd.to_datetime(data['time'])
df = (data.groupby(['name', data['time'].dt.to_period('m')])['values']
.sum()
.groupby(level=0)
.mean()
.reset_index(name='Monthly Average'))
print (df)
name Monthly Average
0 A 25
1 B 30
How to calculate the Monthly Average over Multiple Years with multiple Latitude and Longitude - Pandas - Xarray
If I understand, you're after the long-term mean for each month. If so, you can use xarray with groupby()
instead of resample()
to calculate these climatologies.
climatology = Multidata.groupby("time.month").mean("time")
See xarray docs here calculating monthly anomalies.
Related Topics
Key Error When Selecting Columns in Pandas Dataframe After Read_Csv
How to Extract a Value (I Want an Int Not Row) from a Dataframe and Do Simple Calculations on It
How to Select the Last Column of Dataframe
Taking Data from Drop-Down Menu Using Flask
What Is the Fastest Way to Stack Numpy Arrays in a Loop
Pandas - Find Rows With Matching Values in Two Columns and Multiply Value in Another Column
How to Perform Union on Two Dataframes With Different Amounts of Columns in Spark
How to Remove Words in a Column in Pandas
Python - How to Make User Input Not Case Sensitive
How to Remove Any Url Within a String in Python
How to Run Python Script from Another Machine Without Installing Imported Modules
Get the Row(S) Which Have the Max Value in Groups Using Groupby
Pandas: How to Assign Values Based on Multiple Conditions for Existing Columns
How to Split Text Without Spaces into List of Words
How to Plot Multiple Pandas Columns