Calculate the mean of every 13 rows in data frame
Here's a solution using aggregate()
and rep()
.
df <- data.frame(a=1:12, b=13:24 );
df;
## a b
## 1 1 13
## 2 2 14
## 3 3 15
## 4 4 16
## 5 5 17
## 6 6 18
## 7 7 19
## 8 8 20
## 9 9 21
## 10 10 22
## 11 11 23
## 12 12 24
n <- 5;
aggregate(df, list(rep(1:(nrow(df) %/% n + 1), each = n, len = nrow(df))), mean)[-1];
## a b
## 1 3.0 15.0
## 2 8.0 20.0
## 3 11.5 23.5
The important part of this solution that handles the issue of non-divisibility of nrow(df)
by n
is specifying the len
parameter (actually the full parameter name is length.out
) of rep()
, which automatically caps the group vector to the appropriate length.
How to average column values every n rows in pandas
IIUC, DataFrame.melt
+ mean for each site with GroupBy.mean
# df_tmp = df_tmp.astype(int) # get correct result
df_tmp.melt('site').groupby('site')['value'].mean()
Or:
# df_tmp = df_tmp.astype(int) # get correct result
df_tmp.set_index('site').stack().groupby(level=0).mean()
#df_tmp.set_index('site').stack().mean(level=0) # .mean(level=0) deprecated
Output
site
1 3.333333
2 7.333333
Name: value, dtype: float64
How to calculate mean of specific rows in python dataframe?
You should avoid as much as possible to iterate rows in a dataframe, because it is very unefficient...
groupby
is the way to go when you want to apply the same processing to various groups of rows identified by their values in one or more columns. Here what you want is (*):
df.groupby('TagName')['Sample_value'].mean().reset_index()
it gives as expected:
TagName Sample_value
0 Steam 1.081447e+06
1 Utilities 3.536931e+05
Details on the magic words:
groupby
: identifies the column(s) used to group the rows (same values)['Sample_values']
: restrict the groupby object to the column of interestmean()
: computes the mean per groupreset_index()
: by default the grouping columns go into the index, which is fine for the mean operation.reset_index
make them back normal columns
For a column in pandas dataframe, calculate mean of column values in previous 4th, 8th and 12th row from the present row?
.shift()
is your missing part. We can use it to access previous rows from the existing row in a Pandas dataframe.
Let's use .groupby()
, .apply()
and .shift()
as follows:
df['New column'] = df.groupby((df['Row number'] - 1) // 13)['Existing column'].apply(lambda x: (x.shift(4) + x.shift(8) + x.shift(12)) / 3)
Here, rows are partitioned into groups of 13 rows by grouping them under different group numbers set by (df['Row number'] - 1) // 13
Then within each group, we use .apply()
on the column Existing column
and use .shift()
to get the previous 4th, 8th and 12th entries within the group.
Test Run
data = {'Row number' : np.arange(1, 40), 'Existing column': np.arange(11, 50) }
df = pd.DataFrame(data)
print(df)
Row number Existing column
0 1 11
1 2 12
2 3 13
3 4 14
4 5 15
5 6 16
6 7 17
7 8 18
8 9 19
9 10 20
10 11 21
11 12 22
12 13 23
13 14 24
14 15 25
15 16 26
16 17 27
17 18 28
18 19 29
19 20 30
20 21 31
21 22 32
22 23 33
23 24 34
24 25 35
25 26 36
26 27 37
27 28 38
28 29 39
29 30 40
30 31 41
31 32 42
32 33 43
33 34 44
34 35 45
35 36 46
36 37 47
37 38 48
38 39 49
df['New column'] = df.groupby((df['Row number'] - 1) // 13)['Existing column'].apply(lambda x: (x.shift(4) + x.shift(8) + x.shift(12)) / 3)
print(df)
Row number Existing column New column
0 1 11 NaN
1 2 12 NaN
2 3 13 NaN
3 4 14 NaN
4 5 15 NaN
5 6 16 NaN
6 7 17 NaN
7 8 18 NaN
8 9 19 NaN
9 10 20 NaN
10 11 21 NaN
11 12 22 NaN
12 13 23 15.0
13 14 24 NaN
14 15 25 NaN
15 16 26 NaN
16 17 27 NaN
17 18 28 NaN
18 19 29 NaN
19 20 30 NaN
20 21 31 NaN
21 22 32 NaN
22 23 33 NaN
23 24 34 NaN
24 25 35 NaN
25 26 36 28.0
26 27 37 NaN
27 28 38 NaN
28 29 39 NaN
29 30 40 NaN
30 31 41 NaN
31 32 42 NaN
32 33 43 NaN
33 34 44 NaN
34 35 45 NaN
35 36 46 NaN
36 37 47 NaN
37 38 48 NaN
38 39 49 41.0
Find the mean of every 3 rows
Probably you need something like that
library(dplyr)
df %>%
group_by(group = gl(n()/3, 3)) %>%
summarise_at(-1, mean, na.rm = TRUE)
# group Station1 Station2 Station3 Station4
# <fct> <dbl> <dbl> <dbl> <dbl>
#1 1 30 46.7 32.3 25.7
#2 2 26 45.7 30.3 19.3
Compute row average in pandas
You can specify a new column. You also need to compute the mean along the rows, so use axis=1
.
df['mean'] = df.mean(axis=1)
>>> df
Y1961 Y1962 Y1963 Y1964 Y1965 Region mean
0 82.567307 83.104757 83.183700 83.030338 82.831958 US 82.943612
1 2.699372 2.610110 2.587919 2.696451 2.846247 US 2.688020
2 14.131355 13.690028 13.599516 13.649176 13.649046 US 13.743824
3 0.048589 0.046982 0.046583 0.046225 0.051750 US 0.048026
4 0.553377 0.548123 0.582282 0.577811 0.620999 US 0.576518
Related Topics
Error in Confusionmatrix the Data and Reference Factors Must Have the Same Number of Levels
Find Duplicated Elements With Dplyr
Mapping Columns/Rows from One Dataframe to Another Based on Row Number
Using Ggplot2, How to Insert a Break in the Axis
Ggplot2 Stacked Bar Chart - Each Bar Being 100% and With Percenage Labels Inside Each Bar
Multiplying All Columns in Dataframe by Single Column
Duplicating Rows in R Merge Function
Split an Audio File into Pieces of an Arbitrary Size
Combing a Categorical Variable to Create a New Categorical Variable in R
Conditional Replacement of a Comma With a Dot in a Numeric Column
Error in Confusion Matrix:The Data and Reference Factors Must Have the Same Number of Levels
Combine (Rbind) Data Frames and Create Column With Name of Original Data Frames
Formula With Dynamic Number of Variables
Cluster Analysis in R: Determine the Optimal Number of Clusters
How to Name Variables on the Fly
Ggplot2 - Bar Plot With Both Stack and Dodge