Use Groupby in Pandas to Count Things in One Column in Comparison to Another

Use groupby in Pandas to count things in one column in comparison to another

I'd do:

df.groupby('Event').Status.value_counts().unstack().fillna(0)

Or use the fill_value argument:

df.groupby('Event').Status.value_counts().unstack(fill_value=0)

Sample Image

Timing

Sample Image

Pandas, groupby and counting data in others columns

You can convert column to years by Series.dt.year and aggregate by GroupBy.agg with dictionary for columns with aggregation function, last add DataFrame.reindex if necessary same order of columns like in original DataFrame:

#if necessary convert to datetimes
df['CreationDate'] = pd.to_datetime(df['CreationDate'])

df1 = (df.groupby(df['CreationDate'].dt.year)
         .agg({'Id':'first', 'Score':'sum', 'ViewCount':'sum'})
         .reset_index()
         .reindex(columns=df.columns)
       )

print (df1)
   Id  CreationDate  Score  ViewCount
0   1          2011    109      16125
1   3          2012     36       1015
2   4          2018     33       7064

Pandas DataFrame Groupby two columns and get counts

Followed by @Andy's answer, you can do following to solve your second question:

In [56]: df.groupby(['col5','col2']).size().reset_index().groupby('col2')[[0]].max()
Out[56]: 
      0
col2   
A     3
B     2
C     1
D     3

Pandas, groupby and count

You seem to want to group by several columns at once:

df.groupby(['revenue','session','user_id'])['user_id'].count()

should give you what you want

Group dataframe and get sum AND count?

try this:

In [110]: (df.groupby('Company Name')
   .....:    .agg({'Organisation Name':'count', 'Amount': 'sum'})
   .....:    .reset_index()
   .....:    .rename(columns={'Organisation Name':'Organisation Count'})
   .....: )
Out[110]:
          Company Name   Amount  Organisation Count
0  Vifor Pharma UK Ltd  4207.93                   5

or if you don't want to reset index:

df.groupby('Company Name')['Amount'].agg(['sum','count'])

df.groupby('Company Name').agg({'Amount': ['sum','count']})

Demo:

In [98]: df.groupby('Company Name')['Amount'].agg(['sum','count'])
Out[98]:
                         sum  count
Company Name
Vifor Pharma UK Ltd  4207.93      5

In [99]: df.groupby('Company Name').agg({'Amount': ['sum','count']})
Out[99]:
                      Amount
                         sum count
Company Name
Vifor Pharma UK Ltd  4207.93     5

Pandas groupby multiple columns to compare values

Assuming you want the count per Param, you can use:

out = df['Value'].ge(df['Limit']).groupby(df['Param']).sum()

output:

Param
A    1
B    2
C    1
D    0
E    0
dtype: int64

used input (with a duplicated row "B" for the example):

  Param  Value  Limit
0     A    1.5    1.0
1     B    2.5    1.0
2     B    2.5    1.0
3     C    2.0    2.0
4     D    2.0    2.5
5     E    1.5    2.0

as DataFrame

df['Value'].ge(df['Limit']).groupby(df['Param']).sum().reset_index(name='Count')

# or

df['Value'].ge(df['Limit']).groupby(df['Param']).agg(Count='sum').reset_index()

output:

  Param  Count
0     A      1
1     B      2
2     C      1
3     D      0
4     E      0

Python Pandas: GROUPBY AND COUNT OF VALUES OF DIFFERENT COLUMNS in minimal steps and in a very fast way

Easy solution

Let us use crosstabs to calculate frequency tables then concat the tables along columns axis:

s1 = pd.crosstab(df['CONTINENT'], df['AGE_GROUP'])
s2 = pd.crosstab(df['CONTINENT'], df['APPROVAL_STATUS'])

pd.concat([s1, s2, s2.sum(1).rename('USER_COUNT')], axis=1)

           18-20  21-25  26-30  31-35  36-40  41-45  46-50  Above 50  NO  YES  not_confirmed  USER_COUNT
CONTINENT                                                                                               
AMERICA        1      1      1      4      0      0      0         1   3    3              2           8
ASIA           0      0      7      0      3      0      3         0   2    8              3          13
EUROPE         1      1      0      1      1      4      0         1   6    1              2           9

Pandas groupby two columns and count shared values in third

If I am understanding you correctly, I think you want to group by col3 instead of col2:

df = pd.read_html('https://stackoverflow.com/q/69419264/14277722')[0]

df = df.groupby(['col1','col3'])['col2'].apply(list).reset_index()
df['count'] = df['col2'].apply(len)

You can then remove rows where col2 is a subset of another row with the following:

arr = pd.get_dummies(df['col2'].explode()).max(level=0).to_numpy()
subsets = np.matmul(arr, arr.T)
np.fill_diagonal(subsets, 0)
mask = ~np.equal(subsets, np.sum(arr, 1)).any(0)

df = df[mask]

   col1 col3             col2  count
0     A   12  [ID1, ID2, ID4]      3
3     A   18            [ID3]      1

Group by two columns and count the occurrences of each combination in Pandas

Maybe this is what you want?

>>> data = pd.DataFrame({'user_id' : ['a1', 'a1', 'a1', 'a2','a2','a2','a3','a3','a3'], 'product_id' : ['p1','p1','p2','p1','p1','p1','p2','p2','p3']})
>>> count_series = data.groupby(['user_id', 'product_id']).size()
>>> count_series
user_id  product_id
a1       p1            2
         p2            1
a2       p1            3
a3       p2            2
         p3            1
dtype: int64
>>> new_df = count_series.to_frame(name = 'size').reset_index()
>>> new_df
  user_id product_id  size
0      a1         p1     2
1      a1         p2     1
2      a2         p1     3
3      a3         p2     2
4      a3         p3     1
>>> new_df['size']
0    2
1    1
2    3
3    2
4    1
Name: size, dtype: int64

Use Groupby in Pandas to Count Things in One Column in Comparison to Another