Pandas - Calculate Average of Columns With Condition Based on Values in Other Columns

compute column average based on conditions pandas

You can use .groupby() and .mean(), followed by rename column by .rename(), as follows:

df2 = df.groupby(['names', 'subject'], as_index=False)['value'].mean().rename({'value': 'average'}, axis=1)

Result:

print(df2)

  names subject    average
0     A       X  10.000000
1     A       Y  15.666667
2     B       P  12.250000
3     B       Q  10.000000

How to calculate the average of a column where the row meets a certain condition in Pandas

Simply use groupby + agg:

agg = df.groupby('number')['time'].agg(['count', 'mean']).reset_index()

Output:

>>> agg
   number  count  mean
0       1      5  37.4
1       2      3  26.0
2       4      4  30.5
3       6      2  53.0

Average certain columns based on values in other columns

Something very relevant (supporting int as column names)- https://github.com/theislab/anndata/issues/31

Due to this bug/issue, I converted the column names to type string:

test_df = pd.DataFrame({'1':[1600,1600,1600,1700,1800],'2':[1500,2000,1400,1500,2000],
'3':[2000,2000,2000,2000,2000],'51':[65,80,75,80,75],'52':[63,82,85,85,75],'53': 
[83,80,75,76,78]})

Created a new dataframe - new_df to meet out requirements

new_df = test_df[['1', '2', '3']].where(test_df[['1','2','3']]<1700).notnull()

new_df now looks like this

       1      2      3
0   True   True  False
1   True  False  False
2   True   True  False
3  False   True  False
4  False  False  False

Then simply rename the column and check using 'where'

new_df = new_df.rename(columns={"1": "51", "2":"52", "3":"53"})
test_df['mean_value'] = test_df[['51', '52', '53']].where(new_df).mean(axis=1)

This should give you the desired output -

    1     2     3  51  52  53  mean_value
0  1600  1500  2000  65  63  83        64.0
1  1600  2000  2000  80  82  80        80.0
2  1600  1400  2000  75  85  75        80.0
3  1700  1500  2000  80  85  76        85.0
4  1800  2000  2000  75  75  78         NaN

Calculate the average of sections of a column with condition met to create new dataframe

Keywords: groupby, shift, mean

Code:

df_result=df.groupby((df['B'].shift(1,fill_value=0)!= df['B']).cumsum()).mean()
df_result=df_result[df_result['B']!=0]

df_result
     A    B
1  2.0  1.0
3  3.0  1.0

As you might noticed, you need first to determine the consecutive rows blocks having the same values.
One way to do so is by shifting B one row and then comparing it with itself.

df['B_shifted']=df['B'].shift(1,fill_value=0) # fill_value=0 to return int and replace Nan with 0's

df['A']                     =[2, 3, 1, 2, 4, 1, 5, 3, 1, 7, 5]
df['B']                     =[0, 1, 1, 1, 0, 0, 1, 1, 1, 0, 0]
df['B_shifted']             =[0, 0, 1, 1, 1, 0, 0, 1, 1, 1, 0]
(df['B_shifted'] != df['B'])=[F, T, F, F, T, F, T, F, F, T, F]
                                [↑      ][↑   ][↑      ][↑   ]

Now we can use the groupby pandas method as follows:

df_grouped=df.groupby((df['B_shifted'] != df['B']).cumsum())

Now if we looped in the DtaFrameGroupBy object df_grouped
we'll see the following tuples:

(0,    A  B  B_shifted
0  2  0          0)
(1,    A  B  B_shifted
1  3  1          0
2  1  1          1
3  2  1          1)
(2,    A  B  B_shifted
4  4  0          1
5  1  0          0)
(3,    A  B  B_shifted
6  5  1          0
7  3  1          1
8  1  1          1)
(4,     A  B  B_shifted
9   7  0          1
10  5  0          0)

We can simply calculate the mean and filter the zero values now as follow

df_result=df_grouped.mean()
df_result=df_result[df_result['B']!=0][['A','B']]

References:(link, link).

Pandas - Calculate average of columns with condition based on values in other columns

If your columns are in a similar range for both '_a' and '_c', you can simply loop through them;

r = range(1,4)
for i in r:
    df.loc[df["{}_a".format(i)] != 1, "{}_c".format(i)] = np.NaN

df['NEW'] = df[['{}_c'.format(i) for i in r]].mean(axis=1)

How to average certain values of a column based on other columns condition in pandas

You probably want groupby and transform (though I'm not sure in your desired output why type B for 01/02/2010 is 13.5, I think it should be 18.5, i.e. the average of 17 and 20):

df['Value2'] = df.groupby(['Type','Date']).Value.transform('mean')
>>> df
   Index        Date Type  Value  Value2
0      0  01/01/2010    A     10    11.0
1      1  01/01/2010    B     15    20.0
2      2  01/01/2010    B     25    20.0
3      3  01/01/2010    A     12    11.0
4      4  01/02/2010    A      9     8.5
5      5  01/02/2010    B     17    18.5
6      6  01/02/2010    B     20    18.5
7      7  01/02/2010    A      8     8.5

Pandas calculate mean using another column as condition

You can extract the time from the datetime column and group by time only. If that time slow has less than 3 observations, its mean is NaN:

t = pd.date_range("2022-01-01", "2022-01-02", freq="30T").time

grp = df.groupby(df["observation_time"].dt.time)
result = (
    grp["temperature"].mean()     # Calculate the mean temperature for each 30-min period
    .mask(grp.size() < 3, np.nan) # If the period has less than 3 observations, make it nan
    .reindex(t)                   # Make sure we have all periods of a day
    .reset_index()
)

Python column mean based on other column conditions

You can try with this.

df[df['count']>0]['pay'].mean()
#10.515

Group by columns under conditions to calculate average

Use DataFrame.pivot_table with helper column new by copy like ColB, then flatten MultiIndex and add ouput to new DataFrame created by aggregate sum:

df1 = (df.assign(new=df['ColB'])
         .pivot_table(index=['ColA', 'ColB'], 
                      columns='new', 
                      values=['interval','duration'], 
                      fill_value=0,
                      aggfunc='mean'))
df1.columns = df1.columns.map(lambda x: f'{x[0]}{x[1]}')
df = (df.groupby(['ColA','ColB'])['Counter']
        .sum()
        .to_frame(name='SumCounter')
        .join(df1).reset_index())
print (df)
  ColA ColB  SumCounter  durationSD  durationUD  intervalSD  intervalUD
0    A   SD           3         2.5         0.0         3.5           0
1    A   UD          10         0.0         2.0         0.0           1
2    B   SD          32         2.0         0.0         3.5           0
3    B   UD           4         0.0         1.5         0.0           2

How to calculate average of all rows by excluding a few columns based on a condition

Here's one way using melt. The idea is, we filter the relevant part of the DataFrame and melt it; then filter out the customers with all 0 answers to a question (build a filter transforming all by groupby.transform); then for the remaining customers, find the mean response using groupby.mean. Finally, assign these means back to an out DataFrame:

out = df.loc[:, ~df.columns.str.endswith('_0')].copy()
df1 = out.melt('Customer_id')
df1 = df1.join(df1.pop('variable').str.split('_', expand=True))
percentages = (df1.loc[~df1['value'].eq(0).groupby([df1['Customer_id'], df1[0]]).transform('all')]
               .groupby([ 0, 1])['value'].mean() * 100)
percentages.index = percentages.index.map('_'.join)
out.loc[len(out)] = percentages
out.loc[len(out)-1, 'Customer_id'] = 'Average'