Rolling Mean on pandas on a specific column
To assign a column, you can create a rolling object based on your Series
:
df['new_col'] = data['column'].rolling(5).mean()
The answer posted by ac2001 is not the most performant way of doing this. He is calculating a rolling mean on every column in the dataframe, then he is assigning the "ma" column using the "pop" column. The first method of the following is much more efficient:
%timeit df['ma'] = data['pop'].rolling(5).mean()
%timeit df['ma_2'] = data.rolling(5).mean()['pop']
1000 loops, best of 3: 497 µs per loop
100 loops, best of 3: 2.6 ms per loop
I would not recommend using the second method unless you need to store computed rolling means on all other columns.
pandas rolling on specific column
The correct syntax is:
df['sum2'] = df.rolling(window="3d", min_periods=2, on='dt')['Col1'].sum()
print(df)
# Output:
Col1 Col2 Col3 dt sum2
0 10 13 17 2020-01-01 NaN
1 20 23 27 2020-01-02 30.0
2 15 18 22 2020-01-03 45.0
3 30 33 37 2020-01-04 65.0
4 45 48 52 2020-01-05 90.0
Your error is to extract the columns Col1
at first so the column dt
does not exist when rolling
.
>>> df['Col1'] # the column 'dt' does not exist anymore.
0 10
1 20
2 15
3 30
4 45
Name: Col1, dtype: int64
Column-wise rolling mean in pandas
The rolling
method takes in an axis
parameter, which you can set to 1 -
import pandas as pd
df = pd.DataFrame({'id': range(3),
'Date_1': range(3, 6),
'Date_2': range(4, 7),
'Date_3': range(5, 8),
'Date_4': range(6, 9),
'Date_5': range(11, 14)})
df = df.set_index('id')
df.rolling(3, axis=1).mean()
Date_1 Date_2 Date_3 Date_4 Date_5
id
0 NaN NaN 4.0 5.0 7.333333
1 NaN NaN 5.0 6.0 8.333333
2 NaN NaN 6.0 7.0 9.333333
Pandas rolling mean with variable window based on an different column
One option is to loop through the data frame, and assign a new column equal to the rolling_mean for each row.
df['rolling_mean'] = np.nan
for ind in range(len(df)):
df.loc[df.index[ind], 'rolling_mean'] = df.A.rolling(df.loc[df.index[ind], 'B']).mean()[ind]
Python how to create a rolling mean with additional conditions
Use Series.where
df['rolling_mean'] = df['3-day-min'].rolling(3).mean().where(lambda x: x.le(df['3-day-min']), df['3-day-min'])
Or:
df['rolling_mean'] = df['3-day-min'].rolling(3).mean().mask(lambda x: x.gt(df['3-day-min']), df['3-day-min'])
Pandas: DataFrame Rolling Average on a Row
Use DataFrame.rolling
with axis=1
and mean
:
print (df)
0 1 2 3 4 5 6 7 8
0 1 2 3 4 5 6 7 8 9
df1 = df.rolling(3, axis=1).mean()
print (df1)
0 1 2 3 4 5 6 7 8
0 NaN NaN 2.0 3.0 4.0 5.0 6.0 7.0 8.0
If need join to original pass to concat
:
df = pd.concat([df, df1], ignore_index=True)
print (df)
0 1 2 3 4 5 6 7 8
0 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0
1 NaN NaN 2.0 3.0 4.0 5.0 6.0 7.0 8.0
Calculate rolling average for all columns pandas
If you take NaN
s as 0 into your means, can do:
df.fillna(0,inplace=True)
df.rolling(3).mean()
This will give you:
a b c
2019-01-31 NaN NaN NaN
2019-02-28 NaN NaN NaN
2019-03-31 3.086667 3.650000 2.460000
2019-04-30 3.228333 2.433333 1.826667
2019-05-31 2.191667 2.525000 1.910000
2019-06-30 2.495000 2.276667 2.736667
Pandas: Rolling mean using only the last update based on another column
It's a bit tricky. As rolling.apply
works on Series only and you need both "Wharehose" and "Value" to perform the computation, you need to access the complete dataframe using a function (and a "global" variable, which is not super clean IMO):
df['Date'] = pd.to_datetime(df['Date'], dayfirst=True)
df2 = df.set_index('Date')
def agg(s):
return (df2.loc[s.index]
.drop_duplicates(subset='Warehose', keep='last')
['Value'].mean()
)
df['Rolling_Mean'] = (df.sort_values(by='Date')
.rolling('30d', on='Date')
['Value']
.apply(agg, raw=False)
)
output:
Date Warehose Value Rolling_Mean
0 1998-01-10 London 10 10.0
1 1998-01-13 London 13 13.0
2 1998-01-15 New York 37 25.0
3 1998-02-12 London 21 29.0
4 1998-02-20 New York 39 30.0
5 1998-02-21 New York 17 19.0
How to create a rolling mean column in pandas for different subset elements?
This should do what you want:
df['2-DAY AVG'] = df.groupby('PLAYER').SCORE.apply(lambda x: x.rolling(2).mean())
Related Topics
Having Trouble Making a List of Lists of a Designated Size
Plotting Networkx Graph with Node Labels Defaulting to Node Name
Installing Scipy in Python 3.5 on 32-Bit Windows 7 MAChine
How to Call Function That Takes an Argument in a Django Template
How to Simulate Jumping in Pygame for This Particular Code
Access Memory Address in Python
Python Super() Raises Typeerror
Case-Insensitive List Sorting, Without Lowercasing the Result
How to Match Any String from a List of Strings in Regular Expressions in Python
How to Copy Over an Excel Sheet to Another Workbook in Python
File Read Using "Open()" VS "With Open()"
Get the Position of the Largest Value in a Multi-Dimensional Numpy Array
Appending Item to Lists Within a List Comprehension
How to Make an Image with a Transparent Backround in Pygame
How to Pass an Argument to Event Handler in Tkinter
Using a Dictionary to Select Function to Execute
Pandas Dataframe Stack Multiple Column Values into Single Column