Pandas, Groupby and Count

Get statistics for each group (such as count, mean, etc) using pandas GroupBy?

On groupby object, the agg function can take a list to apply several aggregation methods at once. This should give you the result you need:

df[['col1', 'col2', 'col3', 'col4']].groupby(['col1', 'col2']).agg(['mean', 'count'])

Pandas, groupby and count

You seem to want to group by several columns at once:

df.groupby(['revenue','session','user_id'])['user_id'].count()

should give you what you want

Pandas create new column with count from groupby

That's not a new column, that's a new DataFrame:

In [11]: df.groupby(["item", "color"]).count()
Out[11]:
             id
item  color
car   black   2
truck blue    1
      red     2

To get the result you want is to use reset_index:

In [12]: df.groupby(["item", "color"])["id"].count().reset_index(name="count")
Out[12]:
    item  color  count
0    car  black      2
1  truck   blue      1
2  truck    red      2

To get a "new column" you could use transform:

In [13]: df.groupby(["item", "color"])["id"].transform("count")
Out[13]:
0    2
1    2
2    2
3    1
4    2
dtype: int64

I recommend reading the split-apply-combine section of the docs.

Pandas groupby agg - how to get counts?

You can use strings instead of the functions, like so:

df = pd.DataFrame(
    {"id": list("ccdef"), "pushid": list("aabbc"), 
     "sess_length": [10, 20, 30, 40, 50]}
)

df.groupby(["id", "pushid"]).agg({"sess_length": ["sum", "mean", "count"]})

Which outputs:

           sess_length
                   sum mean count
 id pushid
 c  a               30   15     2
 d  b               30   30     1
 e  b               40   40     1
 f  c               50   50     1

Python Pandas: GROUPBY AND COUNT OF VALUES OF DIFFERENT COLUMNS in minimal steps and in a very fast way

Easy solution

Let us use crosstabs to calculate frequency tables then concat the tables along columns axis:

s1 = pd.crosstab(df['CONTINENT'], df['AGE_GROUP'])
s2 = pd.crosstab(df['CONTINENT'], df['APPROVAL_STATUS'])

pd.concat([s1, s2, s2.sum(1).rename('USER_COUNT')], axis=1)

           18-20  21-25  26-30  31-35  36-40  41-45  46-50  Above 50  NO  YES  not_confirmed  USER_COUNT
CONTINENT                                                                                               
AMERICA        1      1      1      4      0      0      0         1   3    3              2           8
ASIA           0      0      7      0      3      0      3         0   2    8              3          13
EUROPE         1      1      0      1      1      4      0         1   6    1              2           9

Pandas Groupby: Count and mean combined

You can use groupby with aggregate:

df = df.groupby('source') \
       .agg({'text':'size', 'sent':'mean'}) \
       .rename(columns={'text':'count','sent':'mean_sent'}) \
       .reset_index()
print (df)
  source  count  mean_sent
0    bar      2      0.415
1    foo      3     -0.500

Pandas groupby two columns and count shared values in third

If I am understanding you correctly, I think you want to group by col3 instead of col2:

df = pd.read_html('https://stackoverflow.com/q/69419264/14277722')[0]

df = df.groupby(['col1','col3'])['col2'].apply(list).reset_index()
df['count'] = df['col2'].apply(len)

You can then remove rows where col2 is a subset of another row with the following:

arr = pd.get_dummies(df['col2'].explode()).max(level=0).to_numpy()
subsets = np.matmul(arr, arr.T)
np.fill_diagonal(subsets, 0)
mask = ~np.equal(subsets, np.sum(arr, 1)).any(0)

df = df[mask]

   col1 col3             col2  count
0     A   12  [ID1, ID2, ID4]      3
3     A   18            [ID3]      1

Need pandas groupby.count() or groupby.size.unstack() to output a dataframe I can use

Try:

x = df.pivot_table(
    index=["Animal", "Year"], columns="Value", aggfunc="size", fill_value=0
).reset_index()
x.columns.name = None
print(x)

Prints:

   Animal  Year  A  B
0       1  2019  0  2
1       1  2020  2  0
2       2  2020  1  0

Pandas groupby count values in aggregate function

`get_dummies`, `groupby` and `sum`

Encode the columns OKEY and COLOR to convert the categorical values into indicator variables, then group the encoded frame by ID and 1 minute Grouper and sum the values per group

pd.get_dummies(df.set_index(['ID', "Time"]))\
  .groupby(['ID',  pd.Grouper(freq='1min', level=1)]).sum()

                        OKEY_NOT_OK  OKEY_OK  COLOR_BLUE  COLOR_RED  COLOR_YELLOW
ID Time                                                                          
0  2021-05-05 19:16:00            0        1           1          0             0
1  2021-05-05 19:16:00            1        0           1          0             0
   2021-05-05 19:17:00            0        1           0          1             0
2  2021-05-05 19:17:00            1        0           0          0             1

Pandas, Groupby and Count

Get statistics for each group (such as count, mean, etc) using pandas GroupBy?

Pandas, groupby and count

Pandas create new column with count from groupby

Pandas groupby agg - how to get counts?

Python Pandas: GROUPBY AND COUNT OF VALUES OF DIFFERENT COLUMNS in minimal steps and in a very fast way

Easy solution

Pandas Groupby: Count and mean combined

Pandas groupby two columns and count shared values in third

Need pandas groupby.count() or groupby.size.unstack() to output a dataframe I can use

Pandas groupby count values in aggregate function

`get_dummies`, `groupby` and `sum`

Related Topics

Leave a reply

Get statistics for each group (such as count, mean, etc) using pandas GroupBy?

Pandas, groupby and count

Pandas create new column with count from groupby

Pandas groupby agg - how to get counts?

Python Pandas: GROUPBY AND COUNT OF VALUES OF DIFFERENT COLUMNS in minimal steps and in a very fast way

Easy solution

Pandas Groupby: Count and mean combined

Pandas groupby two columns and count shared values in third

Need pandas groupby.count() or groupby.size.unstack() to output a dataframe I can use

Pandas groupby count values in aggregate function

get_dummies, groupby and sum

Related Topics

Leave a reply

`get_dummies`, `groupby` and `sum`