How to Calculate Mean Values Grouped on Another Column in Pandas

How to calculate mean values grouped on another column in Pandas

You could groupby on StationID and then take mean() on BiasTemp. To output Dataframe, use as_index=False

In [4]: df.groupby('StationID', as_index=False)['BiasTemp'].mean()
Out[4]:
  StationID  BiasTemp
0        BB       5.0
1     KEOPS       2.5
2    SS0279      15.0

Without as_index=False, it returns a Series instead

In [5]: df.groupby('StationID')['BiasTemp'].mean()
Out[5]:
StationID
BB            5.0
KEOPS         2.5
SS0279       15.0
Name: BiasTemp, dtype: float64

Read more about groupby in this pydata tutorial.

Add a column with mean values for groups based on another column

Can use groupby transform to calculate the mean on the desired columns then join back to the initial DataFrame to add the newly created columns:

df = df.join(
    df.groupby('area')[['prod_a', 'prod_b']]
        .transform('mean')  # Calculate the mean for each group
        .rename(columns='mean {} for the area'.format)  # Rename columns 
)

df:

entity	area	prod_a	prod_b	mean prod_a for the area	mean prod_b for the area
001	A	1	3	1.5	4.5
002	B	2	4	4	4.5
003	A	2	6	1.5	4.5
004	C	7	2	5.5	5
005	C	4	8	5.5	5
006	B	6	5	4	4.5

Calculate mean from a column based on another column in pandas

Assign output back of groupby method to variable, e.g. here df1:

df = pd.read_csv('myfile.csv')
#solution with renamed new column
df1 = df.groupby('date')['score'].mean().reset_index(name='Avg_Score')
#your solution
#df1 = df.groupby('date').mean().reset_index()
df1.to_csv('average.csv', encoding='utf-8', index=False)

How to calculate mean values per group and sort the output by that mean in Pandas

You can add sort_values right after that:

df.groupby("color").power.mean().sort_values(ascending=False)

Or to create an additional column with count which we sort by:

(df.groupby("color").power
   .agg(["mean", "count"])
   .rename(columns="{}_of_power".format)
   .sort_values("count_of_power", ascending=False))

# output: 
        mean_of_power  count_of_power
color                                
red               5.0               3
green             4.0               2
yellow           10.0               1

DataFrame: Group by one column and average other columns

You just need groupby:

data['state'] = data['state'].eq('True')
data.drop('id',axis=1).groupby('group', as_index=False).mean()

Output:

  group     state      value
0     1  0.666667  10.333333
1     2  0.500000   4.000000

Pandas calculate mean using another column as condition

You can extract the time from the datetime column and group by time only. If that time slow has less than 3 observations, its mean is NaN:

t = pd.date_range("2022-01-01", "2022-01-02", freq="30T").time

grp = df.groupby(df["observation_time"].dt.time)
result = (
    grp["temperature"].mean()     # Calculate the mean temperature for each 30-min period
    .mask(grp.size() < 3, np.nan) # If the period has less than 3 observations, make it nan
    .reindex(t)                   # Make sure we have all periods of a day
    .reset_index()
)

Creating a new column based on the mean of other values in group

Compute the means of all other values within each group using a double groupby:

sum all the values within the group
subtract the current (focal) value
divide by one less than the number of items in the group

Assign the shift-ed means to a new column:

means = df.groupby("group").apply(lambda x: x.groupby("col2")["col3"].transform("sum").sub(x["col3"]).div(len(x["col1"].unique())-1)).droplevel(0)

df["mean"] = means.shift().where(df["col1"].eq(df["col1"].shift()),0)

>>> df
   col1  col2  col3  group  mean
0     A  2015    10     10   0.0
1     A  2016    20     10   9.0
2     A  2017    25     10  10.5
3     B  2015    10     10   0.0
4     B  2016    12     10   9.0
5     B  2017    14     10  14.5
6     c  2015     8     10   0.0
7     c  2016     9     10  10.0
8     c  2017    10     10  16.0
9     d  2015    50     20   0.0
10    d  2016    60     20  40.0
11    d  2017    70     20  50.0
12    e  2015    40     20   0.0
13    e  2016    50     20  50.0
14    e  2017    60     20  60.0

Pandas: compute the mean of a column grouped by another column

Your syntax is incorrect, there is no groupby arg for mean, you want to groupby on the col of interest and then call mean on the column of interest:

In [11]:
df.groupby('C')['height'].mean()

Out[11]:
C
1    54.299919
2    52.760444
3    67.672566
Name: height, dtype: float64

Pandas: How to find mean of a column based on duplicate rows in another column?

by group by :

df["mean_humidity"] = df.groupby('dates')['humidity'].transform('mean')
print(df)

output:

>>>
      dates  humidity  mean_humidity
0  1/1/2020        11           22.0
1  1/1/2020        22           22.0
2  1/1/2020        33           22.0
3  1/2/2020        44           55.0
4  1/2/2020        55           55.0
5  1/2/2020        66           55.0

How to Calculate Mean Values Grouped on Another Column in Pandas