Compute Row Average in Pandas

Compute row average in pandas

You can specify a new column. You also need to compute the mean along the rows, so use axis=1.

df['mean'] = df.mean(axis=1)
>>> df
Y1961 Y1962 Y1963 Y1964 Y1965 Region mean
0 82.567307 83.104757 83.183700 83.030338 82.831958 US 82.943612
1 2.699372 2.610110 2.587919 2.696451 2.846247 US 2.688020
2 14.131355 13.690028 13.599516 13.649176 13.649046 US 13.743824
3 0.048589 0.046982 0.046583 0.046225 0.051750 US 0.048026
4 0.553377 0.548123 0.582282 0.577811 0.620999 US 0.576518

How to calculate mean of specific rows in python dataframe?

You should avoid as much as possible to iterate rows in a dataframe, because it is very unefficient...

groupby is the way to go when you want to apply the same processing to various groups of rows identified by their values in one or more columns. Here what you want is (*):

df.groupby('TagName')['Sample_value'].mean().reset_index()

it gives as expected:

     TagName  Sample_value
0 Steam 1.081447e+06
1 Utilities 3.536931e+05

Details on the magic words:

  • groupby: identifies the column(s) used to group the rows (same values)
  • ['Sample_values']: restrict the groupby object to the column of interest
  • mean(): computes the mean per group
  • reset_index(): by default the grouping columns go into the index, which is fine for the mean operation. reset_index make them back normal columns

Calculate row-wise average pandas python

Try using pd.rolling.mean with a window of 2:

>>> df['avg [g/L]'] = df.groupby('ID')['concentration[g/L]'].rolling(2).mean().values
>>> df
ID Time[h] concentration[g/L] avg [g/L]
15127 V527 23.425 59.9 NaN
20361 V527 27.570 73.4 66.65
21880 V527 29.281 75.4 74.40
33133 V560 27.677 75.9 NaN
35077 V560 30.183 75.7 75.80
37117 V560 31.847 74.6 75.15

How to average column values every n rows in pandas

IIUC, DataFrame.melt + mean for each site with GroupBy.mean

# df_tmp = df_tmp.astype(int) # get correct result
df_tmp.melt('site').groupby('site')['value'].mean()

Or:

# df_tmp = df_tmp.astype(int) # get correct result
df_tmp.set_index('site').stack().groupby(level=0).mean()
#df_tmp.set_index('site').stack().mean(level=0) # .mean(level=0) deprecated

Output

site
1 3.333333
2 7.333333
Name: value, dtype: float64

Python Pandas How to calculate the average of every other row in a column

We can get the mean for the even rows this way :

>>> df.iloc[::2].mean() 
Pressure 153.111111
dtype: float64

In the brackets, the syntax is : start(do nothing):stop(do nothing):step_count(2).

So for evens you'd start at 0, go to end, increment by 2.

And we can do the following for the odds rows :

>>> df.iloc[1::2].mean()
Pressure 356.294118
dtype: float64

For odds, we start at 1, go to end, increment by 2.

Calculate mean for selected rows for selected columns in pandas data frame

To select the rows of your dataframe you can use iloc, you can then select the columns you want using square brackets.

For example:

 df = pd.DataFrame(data=[[1,2,3]]*5, index=range(3, 8), columns = ['a','b','c'])

gives the following dataframe:

   a  b  c
3 1 2 3
4 1 2 3
5 1 2 3
6 1 2 3
7 1 2 3

to select only the 3d and fifth row you can do:

df.iloc[[2,4]]

which returns:

   a  b  c
5 1 2 3
7 1 2 3

if you then want to select only columns b and c you use the following command:

df[['b', 'c']].iloc[[2,4]]

which yields:

   b  c
5 2 3
7 2 3

To then get the mean of this subset of your dataframe you can use the df.mean function. If you want the means of the columns you can specify axis=0, if you want the means of the rows you can specify axis=1

thus:

df[['b', 'c']].iloc[[2,4]].mean(axis=0)

returns:

b    2
c 3

As we should expect from the input dataframe.

For your code you can then do:

 df[column_list].iloc[row_index_list].mean(axis=0)

EDIT after comment:
New question in comment:
I have to store these means in another df/matrix. I have L1, L2, L3, L4...LX lists which tells me the index whose mean I need for columns C[1, 2, 3]. For ex: L1 = [0, 2, 3] , means I need mean of rows 0,2,3 and store it in 1st row of a new df/matrix. Then L2 = [1,4] for which again I will calculate mean and store it in 2nd row of the new df/matrix. Similarly till LX, I want the new df to have X rows and len(C) columns. Columns for L1..LX will remain same. Could you help me with this?

Answer:

If i understand correctly, the following code should do the trick (Same df as above, as columns I took 'a' and 'b':

first you loop over all the lists of rows, collection all the means as pd.series, then you concatenate the resulting list of series over axis=1, followed by taking the transpose to get it in the right format.

dfs = list()
for l in L:
dfs.append(df[['a', 'b']].iloc[l].mean(axis=0))

mean_matrix = pd.concat(dfs, axis=1).T

How to calculate the average of a column where the row meets a certain condition in Pandas

Simply use groupby + agg:

agg = df.groupby('number')['time'].agg(['count', 'mean']).reset_index()

Output:

>>> agg
number count mean
0 1 5 37.4
1 2 3 26.0
2 4 4 30.5
3 6 2 53.0

Compute mean value of rows that has the same column value in Pandas

This?

import pandas as pd

df = pd.read_excel('test.xlsx')
df1 = df.groupby(['category']).mean()
print(df)
print(df1)

output:

    C   D category
0 71 44 cat_C
1 5 88 cat_C
2 8 78 cat_C
3 31 27 cat_C
4 42 48 cat_B
5 18 18 cat_B
6 84 23 cat_A
7 94 23 cat_A

C D
category
cat_A 89.00 23.00
cat_B 30.00 33.00
cat_C 28.75 59.25

Calculate mean of row using only certain columns in pandas

Use agg instead of df['std'] = df.std(axis=1, ddof=0)

df[['mean', 'std']] = df.filter(like='score').agg((np.mean, np.std), axis=1)

# In 2 steps
df['mean'] = df.filter(like='score').agg(np.mean, axis=1)
df['std'] = df.filter(like='score').agg(lambda x: np.std(x, ddof=0), axis=1)

Note: I use np.std instead of df.std because ddof is 0 by default in numpy.



Related Topics



Leave a reply



Submit