Rolling Mean on Pandas on a Specific Column

Rolling Mean on pandas on a specific column

To assign a column, you can create a rolling object based on your Series:

df['new_col'] = data['column'].rolling(5).mean()

The answer posted by ac2001 is not the most performant way of doing this. He is calculating a rolling mean on every column in the dataframe, then he is assigning the "ma" column using the "pop" column. The first method of the following is much more efficient:

%timeit df['ma'] = data['pop'].rolling(5).mean()
%timeit df['ma_2'] = data.rolling(5).mean()['pop']

1000 loops, best of 3: 497 µs per loop
100 loops, best of 3: 2.6 ms per loop

I would not recommend using the second method unless you need to store computed rolling means on all other columns.

pandas rolling on specific column

The correct syntax is:

df['sum2'] = df.rolling(window="3d", min_periods=2, on='dt')['Col1'].sum()
print(df)

# Output:
Col1 Col2 Col3 dt sum2
0 10 13 17 2020-01-01 NaN
1 20 23 27 2020-01-02 30.0
2 15 18 22 2020-01-03 45.0
3 30 33 37 2020-01-04 65.0
4 45 48 52 2020-01-05 90.0

Your error is to extract the columns Col1 at first so the column dt does not exist when rolling.

>>> df['Col1']  # the column 'dt' does not exist anymore.
0 10
1 20
2 15
3 30
4 45
Name: Col1, dtype: int64

Column-wise rolling mean in pandas

The rolling method takes in an axis parameter, which you can set to 1 -

import pandas as pd
df = pd.DataFrame({'id': range(3),
'Date_1': range(3, 6),
'Date_2': range(4, 7),
'Date_3': range(5, 8),
'Date_4': range(6, 9),
'Date_5': range(11, 14)})

df = df.set_index('id')
df.rolling(3, axis=1).mean()
    Date_1  Date_2  Date_3  Date_4    Date_5
id
0 NaN NaN 4.0 5.0 7.333333
1 NaN NaN 5.0 6.0 8.333333
2 NaN NaN 6.0 7.0 9.333333

Pandas rolling mean with variable window based on an different column

One option is to loop through the data frame, and assign a new column equal to the rolling_mean for each row.

df['rolling_mean'] = np.nan
for ind in range(len(df)):
df.loc[df.index[ind], 'rolling_mean'] = df.A.rolling(df.loc[df.index[ind], 'B']).mean()[ind]

Python how to create a rolling mean with additional conditions

Use Series.where

df['rolling_mean'] = df['3-day-min'].rolling(3).mean().where(lambda x: x.le(df['3-day-min']), df['3-day-min'])

Or:

df['rolling_mean'] = df['3-day-min'].rolling(3).mean().mask(lambda x: x.gt(df['3-day-min']), df['3-day-min'])

Pandas: DataFrame Rolling Average on a Row

Use DataFrame.rolling with axis=1 and mean:

print (df)
0 1 2 3 4 5 6 7 8
0 1 2 3 4 5 6 7 8 9

df1 = df.rolling(3, axis=1).mean()
print (df1)
0 1 2 3 4 5 6 7 8
0 NaN NaN 2.0 3.0 4.0 5.0 6.0 7.0 8.0

If need join to original pass to concat:

df = pd.concat([df, df1], ignore_index=True)
print (df)
0 1 2 3 4 5 6 7 8
0 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0
1 NaN NaN 2.0 3.0 4.0 5.0 6.0 7.0 8.0

Calculate rolling average for all columns pandas

If you take NaNs as 0 into your means, can do:

df.fillna(0,inplace=True)
df.rolling(3).mean()

This will give you:

a   b   c
2019-01-31 NaN NaN NaN
2019-02-28 NaN NaN NaN
2019-03-31 3.086667 3.650000 2.460000
2019-04-30 3.228333 2.433333 1.826667
2019-05-31 2.191667 2.525000 1.910000
2019-06-30 2.495000 2.276667 2.736667

Pandas: Rolling mean using only the last update based on another column

It's a bit tricky. As rolling.apply works on Series only and you need both "Wharehose" and "Value" to perform the computation, you need to access the complete dataframe using a function (and a "global" variable, which is not super clean IMO):

df['Date'] = pd.to_datetime(df['Date'], dayfirst=True)
df2 = df.set_index('Date')

def agg(s):
return (df2.loc[s.index]
.drop_duplicates(subset='Warehose', keep='last')
['Value'].mean()
)

df['Rolling_Mean'] = (df.sort_values(by='Date')
.rolling('30d', on='Date')
['Value']
.apply(agg, raw=False)
)

output:

        Date  Warehose  Value  Rolling_Mean
0 1998-01-10 London 10 10.0
1 1998-01-13 London 13 13.0
2 1998-01-15 New York 37 25.0
3 1998-02-12 London 21 29.0
4 1998-02-20 New York 39 30.0
5 1998-02-21 New York 17 19.0

How to create a rolling mean column in pandas for different subset elements?

This should do what you want:

df['2-DAY AVG'] = df.groupby('PLAYER').SCORE.apply(lambda x: x.rolling(2).mean())


Related Topics



Leave a reply



Submit