Compute Row Average in Pandas

Compute row average in pandas

You can specify a new column. You also need to compute the mean along the rows, so use axis=1.

df['mean'] = df.mean(axis=1)
>>> df
       Y1961      Y1962      Y1963      Y1964      Y1965 Region       mean
0  82.567307  83.104757  83.183700  83.030338  82.831958     US  82.943612
1   2.699372   2.610110   2.587919   2.696451   2.846247     US   2.688020
2  14.131355  13.690028  13.599516  13.649176  13.649046     US  13.743824
3   0.048589   0.046982   0.046583   0.046225   0.051750     US   0.048026
4   0.553377   0.548123   0.582282   0.577811   0.620999     US   0.576518

How to calculate mean of specific rows in python dataframe?

You should avoid as much as possible to iterate rows in a dataframe, because it is very unefficient...

groupby is the way to go when you want to apply the same processing to various groups of rows identified by their values in one or more columns. Here what you want is (*):

df.groupby('TagName')['Sample_value'].mean().reset_index()

it gives as expected:

     TagName  Sample_value
0      Steam  1.081447e+06
1  Utilities  3.536931e+05

Details on the magic words:

groupby: identifies the column(s) used to group the rows (same values)
['Sample_values']: restrict the groupby object to the column of interest
mean(): computes the mean per group
reset_index(): by default the grouping columns go into the index, which is fine for the mean operation. reset_index make them back normal columns

Calculate row-wise average pandas python

Try using pd.rolling.mean with a window of 2:

>>> df['avg [g/L]'] = df.groupby('ID')['concentration[g/L]'].rolling(2).mean().values
>>> df
         ID  Time[h]  concentration[g/L]  avg [g/L]
15127  V527   23.425                59.9        NaN
20361  V527   27.570                73.4      66.65
21880  V527   29.281                75.4      74.40
33133  V560   27.677                75.9        NaN
35077  V560   30.183                75.7      75.80
37117  V560   31.847                74.6      75.15

How to average column values every n rows in pandas

IIUC, DataFrame.melt + mean for each site with GroupBy.mean

# df_tmp = df_tmp.astype(int) # get correct result
df_tmp.melt('site').groupby('site')['value'].mean()

Or:

# df_tmp = df_tmp.astype(int) # get correct result
df_tmp.set_index('site').stack().groupby(level=0).mean()
#df_tmp.set_index('site').stack().mean(level=0) # .mean(level=0) deprecated

Output

site
1    3.333333
2    7.333333
Name: value, dtype: float64

Python Pandas How to calculate the average of every other row in a column

We can get the mean for the even rows this way :

>>> df.iloc[::2].mean() 
Pressure    153.111111
dtype: float64

In the brackets, the syntax is : start(do nothing):stop(do nothing):step_count(2).

So for evens you'd start at 0, go to end, increment by 2.

And we can do the following for the odds rows :

>>> df.iloc[1::2].mean()
Pressure    356.294118
dtype: float64

For odds, we start at 1, go to end, increment by 2.

Calculate mean for selected rows for selected columns in pandas data frame

To select the rows of your dataframe you can use iloc, you can then select the columns you want using square brackets.

For example:

 df = pd.DataFrame(data=[[1,2,3]]*5, index=range(3, 8), columns = ['a','b','c'])

gives the following dataframe:

to select only the 3d and fifth row you can do:

df.iloc[[2,4]]

which returns:

   a  b  c
5  1  2  3
7  1  2  3

if you then want to select only columns b and c you use the following command:

df[['b', 'c']].iloc[[2,4]]

which yields:

   b  c
5  2  3
7  2  3

To then get the mean of this subset of your dataframe you can use the df.mean function. If you want the means of the columns you can specify axis=0, if you want the means of the rows you can specify axis=1

thus:

df[['b', 'c']].iloc[[2,4]].mean(axis=0)

returns:

b    2
c    3

As we should expect from the input dataframe.

For your code you can then do:

 df[column_list].iloc[row_index_list].mean(axis=0)

EDIT after comment:
New question in comment:
I have to store these means in another df/matrix. I have L1, L2, L3, L4...LX lists which tells me the index whose mean I need for columns C[1, 2, 3]. For ex: L1 = [0, 2, 3] , means I need mean of rows 0,2,3 and store it in 1st row of a new df/matrix. Then L2 = [1,4] for which again I will calculate mean and store it in 2nd row of the new df/matrix. Similarly till LX, I want the new df to have X rows and len(C) columns. Columns for L1..LX will remain same. Could you help me with this?

Answer:

If i understand correctly, the following code should do the trick (Same df as above, as columns I took 'a' and 'b':

first you loop over all the lists of rows, collection all the means as pd.series, then you concatenate the resulting list of series over axis=1, followed by taking the transpose to get it in the right format.

dfs = list()
for l in L:
    dfs.append(df[['a', 'b']].iloc[l].mean(axis=0))

mean_matrix = pd.concat(dfs, axis=1).T

How to calculate the average of a column where the row meets a certain condition in Pandas

Simply use groupby + agg:

agg = df.groupby('number')['time'].agg(['count', 'mean']).reset_index()

Output:

>>> agg
   number  count  mean
0       1      5  37.4
1       2      3  26.0
2       4      4  30.5
3       6      2  53.0

Compute mean value of rows that has the same column value in Pandas

This?

import pandas as pd

df = pd.read_excel('test.xlsx')
df1 = df.groupby(['category']).mean()
print(df)
print(df1)

output:

    C   D category
0  71  44    cat_C
1   5  88    cat_C
2   8  78    cat_C
3  31  27    cat_C
4  42  48    cat_B
5  18  18    cat_B
6  84  23    cat_A
7  94  23    cat_A

              C      D
category
cat_A     89.00  23.00
cat_B     30.00  33.00
cat_C     28.75  59.25

Calculate mean of row using only certain columns in pandas

Use agg instead of df['std'] = df.std(axis=1, ddof=0)

df[['mean', 'std']] = df.filter(like='score').agg((np.mean, np.std), axis=1)

# In 2 steps
df['mean'] = df.filter(like='score').agg(np.mean, axis=1)
df['std'] = df.filter(like='score').agg(lambda x: np.std(x, ddof=0), axis=1)

Note: I use np.std instead of df.std because ddof is 0 by default in numpy.

Compute Row Average in Pandas