Calculate the Mean of Every 13 Rows in Data Frame

Calculate the mean of every 13 rows in data frame

Here's a solution using aggregate() and rep().

df <- data.frame(a=1:12, b=13:24 );
df;
## a b
## 1 1 13
## 2 2 14
## 3 3 15
## 4 4 16
## 5 5 17
## 6 6 18
## 7 7 19
## 8 8 20
## 9 9 21
## 10 10 22
## 11 11 23
## 12 12 24
n <- 5;
aggregate(df, list(rep(1:(nrow(df) %/% n + 1), each = n, len = nrow(df))), mean)[-1];
## a b
## 1 3.0 15.0
## 2 8.0 20.0
## 3 11.5 23.5

The important part of this solution that handles the issue of non-divisibility of nrow(df) by n is specifying the len parameter (actually the full parameter name is length.out) of rep(), which automatically caps the group vector to the appropriate length.

How to average column values every n rows in pandas

IIUC, DataFrame.melt + mean for each site with GroupBy.mean

# df_tmp = df_tmp.astype(int) # get correct result
df_tmp.melt('site').groupby('site')['value'].mean()

Or:

# df_tmp = df_tmp.astype(int) # get correct result
df_tmp.set_index('site').stack().groupby(level=0).mean()
#df_tmp.set_index('site').stack().mean(level=0) # .mean(level=0) deprecated

Output

site
1 3.333333
2 7.333333
Name: value, dtype: float64

How to calculate mean of specific rows in python dataframe?

You should avoid as much as possible to iterate rows in a dataframe, because it is very unefficient...

groupby is the way to go when you want to apply the same processing to various groups of rows identified by their values in one or more columns. Here what you want is (*):

df.groupby('TagName')['Sample_value'].mean().reset_index()

it gives as expected:

     TagName  Sample_value
0 Steam 1.081447e+06
1 Utilities 3.536931e+05

Details on the magic words:

  • groupby: identifies the column(s) used to group the rows (same values)
  • ['Sample_values']: restrict the groupby object to the column of interest
  • mean(): computes the mean per group
  • reset_index(): by default the grouping columns go into the index, which is fine for the mean operation. reset_index make them back normal columns

For a column in pandas dataframe, calculate mean of column values in previous 4th, 8th and 12th row from the present row?

.shift() is your missing part. We can use it to access previous rows from the existing row in a Pandas dataframe.

Let's use .groupby(), .apply() and .shift() as follows:

df['New column'] = df.groupby((df['Row number'] - 1) // 13)['Existing column'].apply(lambda x: (x.shift(4) + x.shift(8) + x.shift(12)) / 3)

Here, rows are partitioned into groups of 13 rows by grouping them under different group numbers set by (df['Row number'] - 1) // 13

Then within each group, we use .apply() on the column Existing column and use .shift() to get the previous 4th, 8th and 12th entries within the group.

Test Run

data = {'Row number' : np.arange(1, 40), 'Existing column': np.arange(11, 50) }
df = pd.DataFrame(data)

print(df)

Row number Existing column
0 1 11
1 2 12
2 3 13
3 4 14
4 5 15
5 6 16
6 7 17
7 8 18
8 9 19
9 10 20
10 11 21
11 12 22
12 13 23
13 14 24
14 15 25
15 16 26
16 17 27
17 18 28
18 19 29
19 20 30
20 21 31
21 22 32
22 23 33
23 24 34
24 25 35
25 26 36
26 27 37
27 28 38
28 29 39
29 30 40
30 31 41
31 32 42
32 33 43
33 34 44
34 35 45
35 36 46
36 37 47
37 38 48
38 39 49

df['New column'] = df.groupby((df['Row number'] - 1) // 13)['Existing column'].apply(lambda x: (x.shift(4) + x.shift(8) + x.shift(12)) / 3)

print(df)

Row number Existing column New column
0 1 11 NaN
1 2 12 NaN
2 3 13 NaN
3 4 14 NaN
4 5 15 NaN
5 6 16 NaN
6 7 17 NaN
7 8 18 NaN
8 9 19 NaN
9 10 20 NaN
10 11 21 NaN
11 12 22 NaN
12 13 23 15.0
13 14 24 NaN
14 15 25 NaN
15 16 26 NaN
16 17 27 NaN
17 18 28 NaN
18 19 29 NaN
19 20 30 NaN
20 21 31 NaN
21 22 32 NaN
22 23 33 NaN
23 24 34 NaN
24 25 35 NaN
25 26 36 28.0
26 27 37 NaN
27 28 38 NaN
28 29 39 NaN
29 30 40 NaN
30 31 41 NaN
31 32 42 NaN
32 33 43 NaN
33 34 44 NaN
34 35 45 NaN
35 36 46 NaN
36 37 47 NaN
37 38 48 NaN
38 39 49 41.0

Find the mean of every 3 rows

Probably you need something like that

library(dplyr)
df %>%
group_by(group = gl(n()/3, 3)) %>%
summarise_at(-1, mean, na.rm = TRUE)

# group Station1 Station2 Station3 Station4
# <fct> <dbl> <dbl> <dbl> <dbl>
#1 1 30 46.7 32.3 25.7
#2 2 26 45.7 30.3 19.3

Compute row average in pandas

You can specify a new column. You also need to compute the mean along the rows, so use axis=1.

df['mean'] = df.mean(axis=1)
>>> df
Y1961 Y1962 Y1963 Y1964 Y1965 Region mean
0 82.567307 83.104757 83.183700 83.030338 82.831958 US 82.943612
1 2.699372 2.610110 2.587919 2.696451 2.846247 US 2.688020
2 14.131355 13.690028 13.599516 13.649176 13.649046 US 13.743824
3 0.048589 0.046982 0.046583 0.046225 0.051750 US 0.048026
4 0.553377 0.548123 0.582282 0.577811 0.620999 US 0.576518


Related Topics



Leave a reply



Submit