Rolling Mean on Pandas on a Specific Column

Rolling Mean on pandas on a specific column

To assign a column, you can create a rolling object based on your Series:

df['new_col'] = data['column'].rolling(5).mean()

The answer posted by ac2001 is not the most performant way of doing this. He is calculating a rolling mean on every column in the dataframe, then he is assigning the "ma" column using the "pop" column. The first method of the following is much more efficient:

%timeit df['ma'] = data['pop'].rolling(5).mean()
%timeit df['ma_2'] = data.rolling(5).mean()['pop']

1000 loops, best of 3: 497 µs per loop
100 loops, best of 3: 2.6 ms per loop

I would not recommend using the second method unless you need to store computed rolling means on all other columns.

pandas rolling on specific column

The correct syntax is:

df['sum2'] = df.rolling(window="3d", min_periods=2, on='dt')['Col1'].sum()
print(df)

# Output:
   Col1  Col2  Col3         dt  sum2
0    10    13    17 2020-01-01   NaN
1    20    23    27 2020-01-02  30.0
2    15    18    22 2020-01-03  45.0
3    30    33    37 2020-01-04  65.0
4    45    48    52 2020-01-05  90.0

Your error is to extract the columns Col1 at first so the column dt does not exist when rolling.

>>> df['Col1']  # the column 'dt' does not exist anymore.
0    10
1    20
2    15
3    30
4    45
Name: Col1, dtype: int64

Column-wise rolling mean in pandas

The rolling method takes in an axis parameter, which you can set to 1 -

import pandas as pd
df = pd.DataFrame({'id': range(3), 
                   'Date_1': range(3, 6), 
                   'Date_2': range(4, 7), 
                   'Date_3': range(5, 8),
                   'Date_4': range(6, 9),
                   'Date_5': range(11, 14)})

df = df.set_index('id')
df.rolling(3, axis=1).mean()

    Date_1  Date_2  Date_3  Date_4    Date_5
id                                          
0      NaN     NaN     4.0     5.0  7.333333
1      NaN     NaN     5.0     6.0  8.333333
2      NaN     NaN     6.0     7.0  9.333333

Pandas rolling mean with variable window based on an different column

One option is to loop through the data frame, and assign a new column equal to the rolling_mean for each row.

df['rolling_mean'] = np.nan
for ind in range(len(df)):
    df.loc[df.index[ind], 'rolling_mean'] = df.A.rolling(df.loc[df.index[ind], 'B']).mean()[ind]

Python how to create a rolling mean with additional conditions

Use Series.where

df['rolling_mean'] = df['3-day-min'].rolling(3).mean().where(lambda x: x.le(df['3-day-min']), df['3-day-min'])

Or:

df['rolling_mean'] = df['3-day-min'].rolling(3).mean().mask(lambda x: x.gt(df['3-day-min']), df['3-day-min'])

Pandas: DataFrame Rolling Average on a Row

Use DataFrame.rolling with axis=1 and mean:

print (df)
   0  1  2  3  4  5  6  7  8
0  1  2  3  4  5  6  7  8  9

df1 = df.rolling(3, axis=1).mean()
print (df1)
    0   1    2    3    4    5    6    7    8
0 NaN NaN  2.0  3.0  4.0  5.0  6.0  7.0  8.0

If need join to original pass to concat:

df = pd.concat([df, df1], ignore_index=True)
print (df)
     0    1    2    3    4    5    6    7    8
0  1.0  2.0  3.0  4.0  5.0  6.0  7.0  8.0  9.0
1  NaN  NaN  2.0  3.0  4.0  5.0  6.0  7.0  8.0

Calculate rolling average for all columns pandas

If you take NaNs as 0 into your means, can do:

df.fillna(0,inplace=True)
df.rolling(3).mean()

This will give you:

a   b   c
2019-01-31  NaN NaN NaN
2019-02-28  NaN NaN NaN
2019-03-31  3.086667    3.650000    2.460000
2019-04-30  3.228333    2.433333    1.826667
2019-05-31  2.191667    2.525000    1.910000
2019-06-30  2.495000    2.276667    2.736667

Pandas: Rolling mean using only the last update based on another column

It's a bit tricky. As rolling.apply works on Series only and you need both "Wharehose" and "Value" to perform the computation, you need to access the complete dataframe using a function (and a "global" variable, which is not super clean IMO):

df['Date'] = pd.to_datetime(df['Date'], dayfirst=True)
df2 = df.set_index('Date')

def agg(s):
    return (df2.loc[s.index]
               .drop_duplicates(subset='Warehose', keep='last')
               ['Value'].mean()
           )

df['Rolling_Mean'] = (df.sort_values(by='Date')
                        .rolling('30d', on='Date')
                        ['Value']
                        .apply(agg, raw=False)
                      )

output:

        Date  Warehose  Value  Rolling_Mean
0 1998-01-10    London     10          10.0
1 1998-01-13    London     13          13.0
2 1998-01-15  New York     37          25.0
3 1998-02-12    London     21          29.0
4 1998-02-20  New York     39          30.0
5 1998-02-21  New York     17          19.0

How to create a rolling mean column in pandas for different subset elements?

This should do what you want:

df['2-DAY AVG'] = df.groupby('PLAYER').SCORE.apply(lambda x: x.rolling(2).mean())

Rolling Mean on Pandas on a Specific Column