Calculate the Mean of Every 13 Rows in Data Frame

Calculate the mean of every 13 rows in data frame

Here's a solution using aggregate() and rep().

df <- data.frame(a=1:12, b=13:24 );
df;
##     a  b
## 1   1 13
## 2   2 14
## 3   3 15
## 4   4 16
## 5   5 17
## 6   6 18
## 7   7 19
## 8   8 20
## 9   9 21
## 10 10 22
## 11 11 23
## 12 12 24
n <- 5;
aggregate(df, list(rep(1:(nrow(df) %/% n + 1), each = n, len = nrow(df))), mean)[-1];
##      a    b
## 1  3.0 15.0
## 2  8.0 20.0
## 3 11.5 23.5

The important part of this solution that handles the issue of non-divisibility of nrow(df) by n is specifying the len parameter (actually the full parameter name is length.out) of rep(), which automatically caps the group vector to the appropriate length.

How to average column values every n rows in pandas

IIUC, DataFrame.melt + mean for each site with GroupBy.mean

# df_tmp = df_tmp.astype(int) # get correct result
df_tmp.melt('site').groupby('site')['value'].mean()

Or:

# df_tmp = df_tmp.astype(int) # get correct result
df_tmp.set_index('site').stack().groupby(level=0).mean()
#df_tmp.set_index('site').stack().mean(level=0) # .mean(level=0) deprecated

Output

site
1    3.333333
2    7.333333
Name: value, dtype: float64

How to calculate mean of specific rows in python dataframe?

You should avoid as much as possible to iterate rows in a dataframe, because it is very unefficient...

groupby is the way to go when you want to apply the same processing to various groups of rows identified by their values in one or more columns. Here what you want is (*):

df.groupby('TagName')['Sample_value'].mean().reset_index()

it gives as expected:

     TagName  Sample_value
0      Steam  1.081447e+06
1  Utilities  3.536931e+05

Details on the magic words:

groupby: identifies the column(s) used to group the rows (same values)
['Sample_values']: restrict the groupby object to the column of interest
mean(): computes the mean per group
reset_index(): by default the grouping columns go into the index, which is fine for the mean operation. reset_index make them back normal columns

For a column in pandas dataframe, calculate mean of column values in previous 4th, 8th and 12th row from the present row?

.shift() is your missing part. We can use it to access previous rows from the existing row in a Pandas dataframe.

Let's use .groupby(), .apply() and .shift() as follows:

df['New column'] = df.groupby((df['Row number'] - 1) // 13)['Existing column'].apply(lambda x: (x.shift(4) + x.shift(8) + x.shift(12)) / 3)

Here, rows are partitioned into groups of 13 rows by grouping them under different group numbers set by (df['Row number'] - 1) // 13

Then within each group, we use .apply() on the column Existing column and use .shift() to get the previous 4th, 8th and 12th entries within the group.

Test Run

data = {'Row number' : np.arange(1, 40), 'Existing column': np.arange(11, 50) }
df = pd.DataFrame(data)

print(df)

    Row number  Existing column
0            1               11
1            2               12
2            3               13
3            4               14
4            5               15
5            6               16
6            7               17
7            8               18
8            9               19
9           10               20
10          11               21
11          12               22
12          13               23
13          14               24
14          15               25
15          16               26
16          17               27
17          18               28
18          19               29
19          20               30
20          21               31
21          22               32
22          23               33
23          24               34
24          25               35
25          26               36
26          27               37
27          28               38
28          29               39
29          30               40
30          31               41
31          32               42
32          33               43
33          34               44
34          35               45
35          36               46
36          37               47
37          38               48
38          39               49

df['New column'] = df.groupby((df['Row number'] - 1) // 13)['Existing column'].apply(lambda x: (x.shift(4) + x.shift(8) + x.shift(12)) / 3)

print(df)

    Row number  Existing column  New column
0            1               11         NaN
1            2               12         NaN
2            3               13         NaN
3            4               14         NaN
4            5               15         NaN
5            6               16         NaN
6            7               17         NaN
7            8               18         NaN
8            9               19         NaN
9           10               20         NaN
10          11               21         NaN
11          12               22         NaN
12          13               23        15.0
13          14               24         NaN
14          15               25         NaN
15          16               26         NaN
16          17               27         NaN
17          18               28         NaN
18          19               29         NaN
19          20               30         NaN
20          21               31         NaN
21          22               32         NaN
22          23               33         NaN
23          24               34         NaN
24          25               35         NaN
25          26               36        28.0
26          27               37         NaN
27          28               38         NaN
28          29               39         NaN
29          30               40         NaN
30          31               41         NaN
31          32               42         NaN
32          33               43         NaN
33          34               44         NaN
34          35               45         NaN
35          36               46         NaN
36          37               47         NaN
37          38               48         NaN
38          39               49        41.0

Find the mean of every 3 rows

Probably you need something like that

library(dplyr)
df %>%
  group_by(group = gl(n()/3, 3)) %>%
  summarise_at(-1, mean, na.rm = TRUE)

#  group Station1 Station2 Station3 Station4
#  <fct>    <dbl>    <dbl>    <dbl>    <dbl>
#1  1         30     46.7     32.3     25.7
#2  2         26     45.7     30.3     19.3

Compute row average in pandas

You can specify a new column. You also need to compute the mean along the rows, so use axis=1.

df['mean'] = df.mean(axis=1)
>>> df
       Y1961      Y1962      Y1963      Y1964      Y1965 Region       mean
0  82.567307  83.104757  83.183700  83.030338  82.831958     US  82.943612
1   2.699372   2.610110   2.587919   2.696451   2.846247     US   2.688020
2  14.131355  13.690028  13.599516  13.649176  13.649046     US  13.743824
3   0.048589   0.046982   0.046583   0.046225   0.051750     US   0.048026
4   0.553377   0.548123   0.582282   0.577811   0.620999     US   0.576518

Calculate the Mean of Every 13 Rows in Data Frame