How to Calculate Mean Values Grouped on Another Column in Pandas

How to calculate mean values grouped on another column in Pandas

You could groupby on StationID and then take mean() on BiasTemp. To output Dataframe, use as_index=False

In [4]: df.groupby('StationID', as_index=False)['BiasTemp'].mean()
Out[4]:
StationID BiasTemp
0 BB 5.0
1 KEOPS 2.5
2 SS0279 15.0

Without as_index=False, it returns a Series instead

In [5]: df.groupby('StationID')['BiasTemp'].mean()
Out[5]:
StationID
BB 5.0
KEOPS 2.5
SS0279 15.0
Name: BiasTemp, dtype: float64

Read more about groupby in this pydata tutorial.

Add a column with mean values for groups based on another column

Can use groupby transform to calculate the mean on the desired columns then join back to the initial DataFrame to add the newly created columns:

df = df.join(
df.groupby('area')[['prod_a', 'prod_b']]
.transform('mean') # Calculate the mean for each group
.rename(columns='mean {} for the area'.format) # Rename columns
)

df:































































entityareaprod_aprod_bmean prod_a for the areamean prod_b for the area
001A131.54.5
002B2444.5
003A261.54.5
004C725.55
005C485.55
006B6544.5

Calculate mean from a column based on another column in pandas

Assign output back of groupby method to variable, e.g. here df1:

df = pd.read_csv('myfile.csv')
#solution with renamed new column
df1 = df.groupby('date')['score'].mean().reset_index(name='Avg_Score')
#your solution
#df1 = df.groupby('date').mean().reset_index()
df1.to_csv('average.csv', encoding='utf-8', index=False)

How to calculate mean values per group and sort the output by that mean in Pandas

You can add sort_values right after that:

df.groupby("color").power.mean().sort_values(ascending=False)


Or to create an additional column with count which we sort by:

(df.groupby("color").power
.agg(["mean", "count"])
.rename(columns="{}_of_power".format)
.sort_values("count_of_power", ascending=False))

# output:
mean_of_power count_of_power
color
red 5.0 3
green 4.0 2
yellow 10.0 1

DataFrame: Group by one column and average other columns

You just need groupby:

data['state'] = data['state'].eq('True')
data.drop('id',axis=1).groupby('group', as_index=False).mean()

Output:

  group     state      value
0 1 0.666667 10.333333
1 2 0.500000 4.000000

Pandas calculate mean using another column as condition

You can extract the time from the datetime column and group by time only. If that time slow has less than 3 observations, its mean is NaN:

t = pd.date_range("2022-01-01", "2022-01-02", freq="30T").time

grp = df.groupby(df["observation_time"].dt.time)
result = (
grp["temperature"].mean() # Calculate the mean temperature for each 30-min period
.mask(grp.size() < 3, np.nan) # If the period has less than 3 observations, make it nan
.reindex(t) # Make sure we have all periods of a day
.reset_index()
)

Creating a new column based on the mean of other values in group

  1. Compute the means of all other values within each group using a double groupby:
  • sum all the values within the group
  • subtract the current (focal) value
  • divide by one less than the number of items in the group

  1. Assign the shift-ed means to a new column:
means = df.groupby("group").apply(lambda x: x.groupby("col2")["col3"].transform("sum").sub(x["col3"]).div(len(x["col1"].unique())-1)).droplevel(0)

df["mean"] = means.shift().where(df["col1"].eq(df["col1"].shift()),0)

>>> df
col1 col2 col3 group mean
0 A 2015 10 10 0.0
1 A 2016 20 10 9.0
2 A 2017 25 10 10.5
3 B 2015 10 10 0.0
4 B 2016 12 10 9.0
5 B 2017 14 10 14.5
6 c 2015 8 10 0.0
7 c 2016 9 10 10.0
8 c 2017 10 10 16.0
9 d 2015 50 20 0.0
10 d 2016 60 20 40.0
11 d 2017 70 20 50.0
12 e 2015 40 20 0.0
13 e 2016 50 20 50.0
14 e 2017 60 20 60.0

Pandas: compute the mean of a column grouped by another column

Your syntax is incorrect, there is no groupby arg for mean, you want to groupby on the col of interest and then call mean on the column of interest:

In [11]:
df.groupby('C')['height'].mean()

Out[11]:
C
1 54.299919
2 52.760444
3 67.672566
Name: height, dtype: float64

Pandas: How to find mean of a column based on duplicate rows in another column?

by group by :

df["mean_humidity"] = df.groupby('dates')['humidity'].transform('mean')
print(df)

output:

>>>
dates humidity mean_humidity
0 1/1/2020 11 22.0
1 1/1/2020 22 22.0
2 1/1/2020 33 22.0
3 1/2/2020 44 55.0
4 1/2/2020 55 55.0
5 1/2/2020 66 55.0


Related Topics



Leave a reply



Submit