Pandas Dataframe Groupby Two Columns and Get Counts

Pandas DataFrame Groupby two columns and get counts

Followed by @Andy's answer, you can do following to solve your second question:

In [56]: df.groupby(['col5','col2']).size().reset_index().groupby('col2')[[0]].max()
Out[56]: 
      0
col2   
A     3
B     2
C     1
D     3

Group by two columns and count the occurrences of each combination in Pandas

Maybe this is what you want?

>>> data = pd.DataFrame({'user_id' : ['a1', 'a1', 'a1', 'a2','a2','a2','a3','a3','a3'], 'product_id' : ['p1','p1','p2','p1','p1','p1','p2','p2','p3']})
>>> count_series = data.groupby(['user_id', 'product_id']).size()
>>> count_series
user_id  product_id
a1       p1            2
         p2            1
a2       p1            3
a3       p2            2
         p3            1
dtype: int64
>>> new_df = count_series.to_frame(name = 'size').reset_index()
>>> new_df
  user_id product_id  size
0      a1         p1     2
1      a1         p2     1
2      a2         p1     3
3      a3         p2     2
4      a3         p3     1
>>> new_df['size']
0    2
1    1
2    3
3    2
4    1
Name: size, dtype: int64

Groupby two columns, sum, count and display output values in separate column (pandas)

You can use .GroupBy.transform() to set the values for columns pwr and count. Then .set_index() on the 4 columns except type to get a layout similar to the desired output:

df['pwr'] = df.groupby(['id', 'date'])['pwr'].transform('sum')
df['count'] = df.groupby(['id', 'date'])['pwr'].transform('count')

df.set_index(['id', 'date', 'pwr', 'count'])

Output:

                    type
id date pwr count       
aa q321 11  2        hey
            2      hello
   q425 40  2         hi
            2         no
bb q122 3   2         ok
            2       cool
   q422 15  3       sure
            3       sure
            3         ok

Pandas groupby two columns and count shared values in third

If I am understanding you correctly, I think you want to group by col3 instead of col2:

df = pd.read_html('https://stackoverflow.com/q/69419264/14277722')[0]

df = df.groupby(['col1','col3'])['col2'].apply(list).reset_index()
df['count'] = df['col2'].apply(len)

You can then remove rows where col2 is a subset of another row with the following:

arr = pd.get_dummies(df['col2'].explode()).max(level=0).to_numpy()
subsets = np.matmul(arr, arr.T)
np.fill_diagonal(subsets, 0)
mask = ~np.equal(subsets, np.sum(arr, 1)).any(0)

df = df[mask]

   col1 col3             col2  count
0     A   12  [ID1, ID2, ID4]      3
3     A   18            [ID3]      1

Percentage of Total with Groupby for two columns

You can chain groupby:

pct = lambda x: 100 * x / x.sum()

out = df.groupby(['Product', 'Type']).sum().groupby('Product').apply(pct)
print(out)

# Output
                  Sales        Qty
Product Type                      
AA      AC    37.500000  47.058824
        AD    62.500000  52.941176
BB      BC    36.363636  68.750000
        BD    63.636364  31.250000

Python Pandas: GROUPBY AND COUNT OF VALUES OF DIFFERENT COLUMNS in minimal steps and in a very fast way

Easy solution

Let us use crosstabs to calculate frequency tables then concat the tables along columns axis:

s1 = pd.crosstab(df['CONTINENT'], df['AGE_GROUP'])
s2 = pd.crosstab(df['CONTINENT'], df['APPROVAL_STATUS'])

pd.concat([s1, s2, s2.sum(1).rename('USER_COUNT')], axis=1)

           18-20  21-25  26-30  31-35  36-40  41-45  46-50  Above 50  NO  YES  not_confirmed  USER_COUNT
CONTINENT                                                                                               
AMERICA        1      1      1      4      0      0      0         1   3    3              2           8
ASIA           0      0      7      0      3      0      3         0   2    8              3          13
EUROPE         1      1      0      1      1      4      0         1   6    1              2           9

Pandas dataframe grouping by two columns, count and sum

You can try with pd.get_dummies, join and groupby+sum:

pd.get_dummies(df['A or B'])\
  .join(df.drop('A or B',1))\
  .groupby('Name',as_index=False).sum()

Output:

  Name  A  B  Sales ($)
0  Ben  2  1         17
1  Sam  1  2         18

Details:

First, use get_dummies to get categorical variable into dummy/indicator variables:

pd.get_dummies(df['A or B'])
#   A  B
#0  1  0
#1  1  0
#2  0  1
#3  0  1
#4  1  0
#5  0  1

Then use join, to concat the dummies with original df with 'A or B' column dropped:

pd.get_dummies(df['A or B']).join(df.drop('A or B',1))
#   A  B Name  Sales ($)
#0  1  0  Ben         10
#1  1  0  Ben          5
#2  0  1  Ben          2
#3  0  1  Sam          5
#4  1  0  Sam          6
#5  0  1  Sam          7

And finally, do the groupby+sum based on name:

pd.get_dummies(df['A or B']).join(df.drop('A or B',1)).groupby('Name',as_index=False).sum()
#  Name  A  B  Sales ($)
#0  Ben  2  1         17
#1  Sam  1  2         18

In pandas, how do you groupby two columns and sum a third distinct column?

I think you want to group by B,C:

df.groupby(['B','C']).agg({'C':'count', 'A':'sum'})

Output:

       C         A
B  C              
-1 1   1  0.000000
   3   1 -0.007147
   5   2 -0.024842
   7   5 -0.124889
   9   2 -0.015342
   11  1 -0.004549
 1 2   1  0.021954
   4   3  0.131472
   6   1  0.025873
   8   1  0.011601
   10  1  0.006538
   12  1  0.002102

Or better yet with named agg, which allows you to rename the new columns:

df.groupby(['B','C']).agg(C_count=('C','count'),
                          A_sum=('A','sum'))

Output:

       C_count     A_sum
B  C                    
-1 1         1  0.000000
   3         1 -0.007147
   5         2 -0.024842
   7         5 -0.124889
   9         2 -0.015342
   11        1 -0.004549
 1 2         1  0.021954
   4         3  0.131472
   6         1  0.025873
   8         1  0.011601
   10        1  0.006538
   12        1  0.002102

Pandas Dataframe Groupby Two Columns and Get Counts