Pandas Get Column Average/Mean

pandas get column average/mean

If you only want the mean of the weight column, select the column (which is a Series) and call .mean():

In [479]: df
Out[479]:
ID birthyear weight
0 619040 1962 0.123123
1 600161 1963 0.981742
2 25602033 1963 1.312312
3 624870 1987 0.942120

In [480]: df["weight"].mean()
Out[480]: 0.83982437500000007

How to average column values every n rows in pandas

IIUC, DataFrame.melt + mean for each site with GroupBy.mean

# df_tmp = df_tmp.astype(int) # get correct result
df_tmp.melt('site').groupby('site')['value'].mean()

Or:

# df_tmp = df_tmp.astype(int) # get correct result
df_tmp.set_index('site').stack().groupby(level=0).mean()
#df_tmp.set_index('site').stack().mean(level=0) # .mean(level=0) deprecated

Output

site
1 3.333333
2 7.333333
Name: value, dtype: float64

compute column average based on conditions pandas

You can use .groupby() and .mean(), followed by rename column by .rename(), as follows:

df2 = df.groupby(['names', 'subject'], as_index=False)['value'].mean().rename({'value': 'average'}, axis=1)

Result:

print(df2)

names subject average
0 A X 10.000000
1 A Y 15.666667
2 B P 12.250000
3 B Q 10.000000

Calculating mean of column based on the occurence of a number in another column Pandas dataframe Python

Try this

df[df['s1']==5]['s2'].mean()

pandas get column average for rows with a certain value?

Use pandas.core.groupby.GroupBy.mean:

df.groupby("city")["timeDiff"].mean()

how to get the average of dataframe column values

Simply using df.mean() will Do The Right Thing(tm) with respect to NaNs:

>>> df
A B
DATE
2013-05-01 473077 71333
2013-05-02 35131 62441
2013-05-03 727 27381
2013-05-04 481 1206
2013-05-05 226 1733
2013-05-06 NaN 4064
2013-05-07 NaN 41151
2013-05-08 NaN 8144
2013-05-09 NaN 23
2013-05-10 NaN 10
>>> df.mean(axis=1)
DATE
2013-05-01 272205.0
2013-05-02 48786.0
2013-05-03 14054.0
2013-05-04 843.5
2013-05-05 979.5
2013-05-06 4064.0
2013-05-07 41151.0
2013-05-08 8144.0
2013-05-09 23.0
2013-05-10 10.0
dtype: float64

You can use df[["A", "B"]].mean(axis=1) if there are other columns to ignore.

How to calculate mean of specific rows in python dataframe?

You should avoid as much as possible to iterate rows in a dataframe, because it is very unefficient...

groupby is the way to go when you want to apply the same processing to various groups of rows identified by their values in one or more columns. Here what you want is (*):

df.groupby('TagName')['Sample_value'].mean().reset_index()

it gives as expected:

     TagName  Sample_value
0 Steam 1.081447e+06
1 Utilities 3.536931e+05

Details on the magic words:

  • groupby: identifies the column(s) used to group the rows (same values)
  • ['Sample_values']: restrict the groupby object to the column of interest
  • mean(): computes the mean per group
  • reset_index(): by default the grouping columns go into the index, which is fine for the mean operation. reset_index make them back normal columns

Compute row average in pandas

You can specify a new column. You also need to compute the mean along the rows, so use axis=1.

df['mean'] = df.mean(axis=1)
>>> df
Y1961 Y1962 Y1963 Y1964 Y1965 Region mean
0 82.567307 83.104757 83.183700 83.030338 82.831958 US 82.943612
1 2.699372 2.610110 2.587919 2.696451 2.846247 US 2.688020
2 14.131355 13.690028 13.599516 13.649176 13.649046 US 13.743824
3 0.048589 0.046982 0.046583 0.046225 0.051750 US 0.048026
4 0.553377 0.548123 0.582282 0.577811 0.620999 US 0.576518

For a column in pandas dataframe, calculate mean of column values in previous 4th, 8th and 12th row from the present row?

.shift() is your missing part. We can use it to access previous rows from the existing row in a Pandas dataframe.

Let's use .groupby(), .apply() and .shift() as follows:

df['New column'] = df.groupby((df['Row number'] - 1) // 13)['Existing column'].apply(lambda x: (x.shift(4) + x.shift(8) + x.shift(12)) / 3)

Here, rows are partitioned into groups of 13 rows by grouping them under different group numbers set by (df['Row number'] - 1) // 13

Then within each group, we use .apply() on the column Existing column and use .shift() to get the previous 4th, 8th and 12th entries within the group.

Test Run

data = {'Row number' : np.arange(1, 40), 'Existing column': np.arange(11, 50) }
df = pd.DataFrame(data)

print(df)

Row number Existing column
0 1 11
1 2 12
2 3 13
3 4 14
4 5 15
5 6 16
6 7 17
7 8 18
8 9 19
9 10 20
10 11 21
11 12 22
12 13 23
13 14 24
14 15 25
15 16 26
16 17 27
17 18 28
18 19 29
19 20 30
20 21 31
21 22 32
22 23 33
23 24 34
24 25 35
25 26 36
26 27 37
27 28 38
28 29 39
29 30 40
30 31 41
31 32 42
32 33 43
33 34 44
34 35 45
35 36 46
36 37 47
37 38 48
38 39 49

df['New column'] = df.groupby((df['Row number'] - 1) // 13)['Existing column'].apply(lambda x: (x.shift(4) + x.shift(8) + x.shift(12)) / 3)

print(df)

Row number Existing column New column
0 1 11 NaN
1 2 12 NaN
2 3 13 NaN
3 4 14 NaN
4 5 15 NaN
5 6 16 NaN
6 7 17 NaN
7 8 18 NaN
8 9 19 NaN
9 10 20 NaN
10 11 21 NaN
11 12 22 NaN
12 13 23 15.0
13 14 24 NaN
14 15 25 NaN
15 16 26 NaN
16 17 27 NaN
17 18 28 NaN
18 19 29 NaN
19 20 30 NaN
20 21 31 NaN
21 22 32 NaN
22 23 33 NaN
23 24 34 NaN
24 25 35 NaN
25 26 36 28.0
26 27 37 NaN
27 28 38 NaN
28 29 39 NaN
29 30 40 NaN
30 31 41 NaN
31 32 42 NaN
32 33 43 NaN
33 34 44 NaN
34 35 45 NaN
35 36 46 NaN
36 37 47 NaN
37 38 48 NaN
38 39 49 41.0


Related Topics



Leave a reply



Submit