Pandas - Calculate Average of Columns With Condition Based on Values in Other Columns

compute column average based on conditions pandas

You can use .groupby() and .mean(), followed by rename column by .rename(), as follows:

df2 = df.groupby(['names', 'subject'], as_index=False)['value'].mean().rename({'value': 'average'}, axis=1)

Result:

print(df2)

names subject average
0 A X 10.000000
1 A Y 15.666667
2 B P 12.250000
3 B Q 10.000000

How to calculate the average of a column where the row meets a certain condition in Pandas

Simply use groupby + agg:

agg = df.groupby('number')['time'].agg(['count', 'mean']).reset_index()

Output:

>>> agg
number count mean
0 1 5 37.4
1 2 3 26.0
2 4 4 30.5
3 6 2 53.0

Average certain columns based on values in other columns

Something very relevant (supporting int as column names)- https://github.com/theislab/anndata/issues/31

Due to this bug/issue, I converted the column names to type string:

test_df = pd.DataFrame({'1':[1600,1600,1600,1700,1800],'2':[1500,2000,1400,1500,2000],
'3':[2000,2000,2000,2000,2000],'51':[65,80,75,80,75],'52':[63,82,85,85,75],'53':
[83,80,75,76,78]})

Created a new dataframe - new_df to meet out requirements

new_df = test_df[['1', '2', '3']].where(test_df[['1','2','3']]<1700).notnull()

new_df now looks like this

       1      2      3
0 True True False
1 True False False
2 True True False
3 False True False
4 False False False

Then simply rename the column and check using 'where'

new_df = new_df.rename(columns={"1": "51", "2":"52", "3":"53"})
test_df['mean_value'] = test_df[['51', '52', '53']].where(new_df).mean(axis=1)

This should give you the desired output -

    1     2     3  51  52  53  mean_value
0 1600 1500 2000 65 63 83 64.0
1 1600 2000 2000 80 82 80 80.0
2 1600 1400 2000 75 85 75 80.0
3 1700 1500 2000 80 85 76 85.0
4 1800 2000 2000 75 75 78 NaN

Calculate the average of sections of a column with condition met to create new dataframe

Keywords: groupby, shift, mean


Code:

df_result=df.groupby((df['B'].shift(1,fill_value=0)!= df['B']).cumsum()).mean()
df_result=df_result[df_result['B']!=0]
df_result
A B
1 2.0 1.0
3 3.0 1.0

As you might noticed, you need first to determine the consecutive rows blocks having the same values.
One way to do so is by shifting B one row and then comparing it with itself.

df['B_shifted']=df['B'].shift(1,fill_value=0) # fill_value=0 to return int and replace Nan with 0's
df['A']                     =[2, 3, 1, 2, 4, 1, 5, 3, 1, 7, 5]
df['B'] =[0, 1, 1, 1, 0, 0, 1, 1, 1, 0, 0]
df['B_shifted'] =[0, 0, 1, 1, 1, 0, 0, 1, 1, 1, 0]
(df['B_shifted'] != df['B'])=[F, T, F, F, T, F, T, F, F, T, F]
[↑ ][↑ ][↑ ][↑ ]

Now we can use the groupby pandas method as follows:

df_grouped=df.groupby((df['B_shifted'] != df['B']).cumsum())

Now if we looped in the DtaFrameGroupBy object df_grouped
we'll see the following tuples:

(0,    A  B  B_shifted
0 2 0 0)
(1, A B B_shifted
1 3 1 0
2 1 1 1
3 2 1 1)
(2, A B B_shifted
4 4 0 1
5 1 0 0)
(3, A B B_shifted
6 5 1 0
7 3 1 1
8 1 1 1)
(4, A B B_shifted
9 7 0 1
10 5 0 0)

We can simply calculate the mean and filter the zero values now as follow

df_result=df_grouped.mean()
df_result=df_result[df_result['B']!=0][['A','B']]

References:(link, link).

Pandas - Calculate average of columns with condition based on values in other columns

If your columns are in a similar range for both '_a' and '_c', you can simply loop through them;

r = range(1,4)
for i in r:
df.loc[df["{}_a".format(i)] != 1, "{}_c".format(i)] = np.NaN

df['NEW'] = df[['{}_c'.format(i) for i in r]].mean(axis=1)

How to average certain values of a column based on other columns condition in pandas

You probably want groupby and transform (though I'm not sure in your desired output why type B for 01/02/2010 is 13.5, I think it should be 18.5, i.e. the average of 17 and 20):

df['Value2'] = df.groupby(['Type','Date']).Value.transform('mean')
>>> df
Index Date Type Value Value2
0 0 01/01/2010 A 10 11.0
1 1 01/01/2010 B 15 20.0
2 2 01/01/2010 B 25 20.0
3 3 01/01/2010 A 12 11.0
4 4 01/02/2010 A 9 8.5
5 5 01/02/2010 B 17 18.5
6 6 01/02/2010 B 20 18.5
7 7 01/02/2010 A 8 8.5

Pandas calculate mean using another column as condition

You can extract the time from the datetime column and group by time only. If that time slow has less than 3 observations, its mean is NaN:

t = pd.date_range("2022-01-01", "2022-01-02", freq="30T").time

grp = df.groupby(df["observation_time"].dt.time)
result = (
grp["temperature"].mean() # Calculate the mean temperature for each 30-min period
.mask(grp.size() < 3, np.nan) # If the period has less than 3 observations, make it nan
.reindex(t) # Make sure we have all periods of a day
.reset_index()
)

Python column mean based on other column conditions

You can try with this.

df[df['count']>0]['pay'].mean()
#10.515

Group by columns under conditions to calculate average

Use DataFrame.pivot_table with helper column new by copy like ColB, then flatten MultiIndex and add ouput to new DataFrame created by aggregate sum:

df1 = (df.assign(new=df['ColB'])
.pivot_table(index=['ColA', 'ColB'],
columns='new',
values=['interval','duration'],
fill_value=0,
aggfunc='mean'))
df1.columns = df1.columns.map(lambda x: f'{x[0]}{x[1]}')
df = (df.groupby(['ColA','ColB'])['Counter']
.sum()
.to_frame(name='SumCounter')
.join(df1).reset_index())
print (df)
ColA ColB SumCounter durationSD durationUD intervalSD intervalUD
0 A SD 3 2.5 0.0 3.5 0
1 A UD 10 0.0 2.0 0.0 1
2 B SD 32 2.0 0.0 3.5 0
3 B UD 4 0.0 1.5 0.0 2

How to calculate average of all rows by excluding a few columns based on a condition

Here's one way using melt. The idea is, we filter the relevant part of the DataFrame and melt it; then filter out the customers with all 0 answers to a question (build a filter transforming all by groupby.transform); then for the remaining customers, find the mean response using groupby.mean. Finally, assign these means back to an out DataFrame:

out = df.loc[:, ~df.columns.str.endswith('_0')].copy()
df1 = out.melt('Customer_id')
df1 = df1.join(df1.pop('variable').str.split('_', expand=True))
percentages = (df1.loc[~df1['value'].eq(0).groupby([df1['Customer_id'], df1[0]]).transform('all')]
.groupby([ 0, 1])['value'].mean() * 100)
percentages.index = percentages.index.map('_'.join)
out.loc[len(out)] = percentages
out.loc[len(out)-1, 'Customer_id'] = 'Average'

Output:

  Customer_id       q1_a       q1_b       q2_a       q2_b       q2_c
0 asdsd 0.000000 0.000000 0.000000 0.000000 1.000000
1 aasww 1.000000 0.000000 0.000000 1.000000 0.000000
2 aaswe 0.000000 1.000000 0.000000 0.000000 0.000000
3 aaswt 0.000000 1.000000 1.000000 0.000000 0.000000
4 Average 33.333333 66.666667 33.333333 33.333333 33.333333


Related Topics



Leave a reply



Submit