How to calculate mean values grouped on another column in Pandas
You could groupby
on StationID
and then take mean()
on BiasTemp
. To output Dataframe
, use as_index=False
In [4]: df.groupby('StationID', as_index=False)['BiasTemp'].mean()
Out[4]:
StationID BiasTemp
0 BB 5.0
1 KEOPS 2.5
2 SS0279 15.0
Without as_index=False
, it returns a Series
instead
In [5]: df.groupby('StationID')['BiasTemp'].mean()
Out[5]:
StationID
BB 5.0
KEOPS 2.5
SS0279 15.0
Name: BiasTemp, dtype: float64
Read more about groupby
in this pydata tutorial.
Add a column with mean values for groups based on another column
Can use groupby transform
to calculate the mean
on the desired columns then join
back to the initial DataFrame to add the newly created columns:
df = df.join(
df.groupby('area')[['prod_a', 'prod_b']]
.transform('mean') # Calculate the mean for each group
.rename(columns='mean {} for the area'.format) # Rename columns
)
df
:
entity | area | prod_a | prod_b | mean prod_a for the area | mean prod_b for the area |
---|---|---|---|---|---|
001 | A | 1 | 3 | 1.5 | 4.5 |
002 | B | 2 | 4 | 4 | 4.5 |
003 | A | 2 | 6 | 1.5 | 4.5 |
004 | C | 7 | 2 | 5.5 | 5 |
005 | C | 4 | 8 | 5.5 | 5 |
006 | B | 6 | 5 | 4 | 4.5 |
Calculate mean from a column based on another column in pandas
Assign output back of groupby
method to variable, e.g. here df1
:
df = pd.read_csv('myfile.csv')
#solution with renamed new column
df1 = df.groupby('date')['score'].mean().reset_index(name='Avg_Score')
#your solution
#df1 = df.groupby('date').mean().reset_index()
df1.to_csv('average.csv', encoding='utf-8', index=False)
How to calculate mean values per group and sort the output by that mean in Pandas
You can add sort_values
right after that:
df.groupby("color").power.mean().sort_values(ascending=False)
Or to create an additional column with count which we sort by:
(df.groupby("color").power
.agg(["mean", "count"])
.rename(columns="{}_of_power".format)
.sort_values("count_of_power", ascending=False))
# output:
mean_of_power count_of_power
color
red 5.0 3
green 4.0 2
yellow 10.0 1
DataFrame: Group by one column and average other columns
You just need groupby
:
data['state'] = data['state'].eq('True')
data.drop('id',axis=1).groupby('group', as_index=False).mean()
Output:
group state value
0 1 0.666667 10.333333
1 2 0.500000 4.000000
Pandas calculate mean using another column as condition
You can extract the time from the datetime column and group by time only. If that time slow has less than 3 observations, its mean is NaN:
t = pd.date_range("2022-01-01", "2022-01-02", freq="30T").time
grp = df.groupby(df["observation_time"].dt.time)
result = (
grp["temperature"].mean() # Calculate the mean temperature for each 30-min period
.mask(grp.size() < 3, np.nan) # If the period has less than 3 observations, make it nan
.reindex(t) # Make sure we have all periods of a day
.reset_index()
)
Creating a new column based on the mean of other values in group
- Compute the means of all other values within each group using a double
groupby
:
sum
all the values within the group- subtract the current (focal) value
- divide by one less than the number of items in the group
- Assign the
shift
-ed means to a new column:
means = df.groupby("group").apply(lambda x: x.groupby("col2")["col3"].transform("sum").sub(x["col3"]).div(len(x["col1"].unique())-1)).droplevel(0)
df["mean"] = means.shift().where(df["col1"].eq(df["col1"].shift()),0)
>>> df
col1 col2 col3 group mean
0 A 2015 10 10 0.0
1 A 2016 20 10 9.0
2 A 2017 25 10 10.5
3 B 2015 10 10 0.0
4 B 2016 12 10 9.0
5 B 2017 14 10 14.5
6 c 2015 8 10 0.0
7 c 2016 9 10 10.0
8 c 2017 10 10 16.0
9 d 2015 50 20 0.0
10 d 2016 60 20 40.0
11 d 2017 70 20 50.0
12 e 2015 40 20 0.0
13 e 2016 50 20 50.0
14 e 2017 60 20 60.0
Pandas: compute the mean of a column grouped by another column
Your syntax is incorrect, there is no groupby
arg for mean
, you want to groupby
on the col of interest and then call mean
on the column of interest:
In [11]:
df.groupby('C')['height'].mean()
Out[11]:
C
1 54.299919
2 52.760444
3 67.672566
Name: height, dtype: float64
Pandas: How to find mean of a column based on duplicate rows in another column?
by group by :
df["mean_humidity"] = df.groupby('dates')['humidity'].transform('mean')
print(df)
output:
>>>
dates humidity mean_humidity
0 1/1/2020 11 22.0
1 1/1/2020 22 22.0
2 1/1/2020 33 22.0
3 1/2/2020 44 55.0
4 1/2/2020 55 55.0
5 1/2/2020 66 55.0
Related Topics
Numpy Index Slice Without Losing Dimension Information
Can Python Pickle Lambda Functions
What Happens When a Module Is Imported Twice
How to Install MySQLdb (Python Data Access Library to MySQL) on MAC Os X
Python "Syntaxerror: Non-Ascii Character '\Xe2' in File"
How to Stop Flask Application Without Using Ctrl-C
Dangers of Sys.Setdefaultencoding('Utf-8')
How to Run Python Scripts Using Gimpfu from Command Line
Python Pandas Counting the Occurrences of a Specific Value
Split a Generator into Chunks Without Pre-Walking It
Why Do I Get Typeerror: Can't Multiply Sequence by Non-Int of Type 'Float'
Is It Ever Useful to Use Python's Input Over Raw_Input
How to Use 'Else' in a List Comprehension