Get Statistics For Each Group (Such as Count, Mean, etc) Using Pandas Groupby

Get statistics for each group (such as count, mean, etc) using pandas GroupBy?

On groupby object, the agg function can take a list to apply several aggregation methods at once. This should give you the result you need:

df[['col1', 'col2', 'col3', 'col4']].groupby(['col1', 'col2']).agg(['mean', 'count'])

Pandas, groupby and count

You seem to want to group by several columns at once:

df.groupby(['revenue','session','user_id'])['user_id'].count()

should give you what you want

Pandas Groupby: Count and mean combined

You can use groupby with aggregate:

df = df.groupby('source') \
       .agg({'text':'size', 'sent':'mean'}) \
       .rename(columns={'text':'count','sent':'mean_sent'}) \
       .reset_index()
print (df)
  source  count  mean_sent
0    bar      2      0.415
1    foo      3     -0.500

Pandas groupby and count numbers of item by conditions

You can use a named groupby:

df_test.groupby(
    ['ID1','ID2']).agg(
    Count_ID2=('ID2', 'count'),
    Count_ID3=('ID3', 'count'),
    Count_condition=("condition", lambda x: str(x).count('!')))

prints:

         Count_ID2  Count_ID3  Count_condition
ID1 ID2                                       
A   a            3          3                1
    aa           1          1                1
    aaa          2          2                0
B   b            2          2                1
    bb           2          2                1

In the above we are counting the occurences with aggfunc="count" for columns "ID2" and "ID3", and creating a small custom function which count's the occurences of ! for the "condition" column. We do the aforementioned for each group and we returned named columns for our aggregation results

Python Pandas : group by in groups by and average, count, median

Try using pd.NamedAgg:

df.groupby('User').agg(avg_time=('time','mean'),
                       mean_time=('time','median'),
                       state=('state','first'),
                       user_count=('time','count')).reset_index()

Output:

  User  avg_time  mean_time state  user_count
0    A       1.5        1.5    CA           2
1    B       3.0        3.0    ID           1
2    C       4.0        4.0    OR           3

Pandas Groupby Syntax explanation

It reads as if you are calling the .mean() function on the age column specifically. The second appears like you are calling .mean() on the whole groupby object and selecting the age column after?

This is exactly what's happening. df.groupby() returns a dataframe. The .mean() method is applied column-wise by default, so the mean of each column is calculated independent of the other columns and the results are returned as a Series (which can be indexed) if run on the full dataframe.

Reversing the order produces a single column as a Series and then calculates the mean. If you know you only want the mean for a single column, it will be faster to isolate that first, rather than calculate the mean for every column (especially if you have a very large dataframe).

How do I use the Pandas groupby function to calculate the mean for the previous year?

Create a sample data set

import pandas
import numpy as np
df = pandas.DataFrame(
    {'player': ['B', 'A', 'A', 'B', 'A', 'B', 'B', 'A'],
     'datetime': ['2020-01-01', '2020-01-01', '2021-01-01', '2021-01-01',
                  '2021-01-01', '2021-01-01', '2021-01-01', '2021-01-01'],
     'score': [40, 50, 100, 200, 160, 140, 160, 200],
    }
)
df["datetime"] = pandas.to_datetime(df["datetime"])
df["year"] = df["datetime"].dt.year

Use transform to add the current season average to the data frame

df["season_avg"] = df.groupby(["datetime", "player"])["score"].transform("mean")
df

  player   datetime  score  year  season_avg
0      B 2020-01-01     40  2020   40.000000
1      A 2020-01-01     50  2020   50.000000
2      A 2021-01-01    100  2021  153.333333
3      B 2021-01-01    200  2021  166.666667
4      A 2021-01-01    160  2021  153.333333
5      B 2021-01-01    140  2021  166.666667
6      B 2021-01-01    160  2021  166.666667
7      A 2021-01-01    200  2021  153.333333

Shift cannot be applied here because years are repeated

df.sort_values(["year"], ascending=True).groupby(["player"])["season_avg"].transform("shift")

0           NaN
1           NaN
2     50.000000
3     40.000000
4    153.333333
5    166.666667
6    166.666667
7    153.333333
Name: season_avg, dtype: float64

Compute the average from the previous year and join them to the original dataframe

savg = (df.groupby(["year", "player"])
        .agg(last_season_avg = ("score", "mean"))
        .reset_index())
savg["year"] = savg["year"] + 1
savg

   year player  last_season_avg
0  2021      A        50.000000
1  2021      B        40.000000
2  2022      A       153.333333
3  2022      B       166.666667

df.merge(savg, on=["player", "year"], how="left" )

  player   datetime  score  year  season_avg  last_season_avg
0      B 2020-01-01     40  2020   40.000000              NaN
1      A 2020-01-01     50  2020   50.000000              NaN
2      A 2021-01-01    100  2021  153.333333             50.0
3      B 2021-01-01    200  2021  166.666667             40.0
4      A 2021-01-01    160  2021  153.333333             50.0
5      B 2021-01-01    140  2021  166.666667             40.0
6      B 2021-01-01    160  2021  166.666667             40.0
7      A 2021-01-01    200  2021  153.333333             50.0

Another way to compute the average from the previous year, using shift is maybe more elegant than doing year + 1.

savg = (df.groupby(["year", "player"])
        .agg(season_avg = ("score", "mean"))
        .reset_index()
        .sort_values(["year"])
       )
savg["last_season_avg"] = savg.groupby(["player"])["season_avg"].transform("shift")

Get Statistics For Each Group (Such as Count, Mean, etc) Using Pandas Groupby