Pandas Dataframe Groupby Two Columns and Get Counts

Pandas DataFrame Groupby two columns and get counts

Followed by @Andy's answer, you can do following to solve your second question:

In [56]: df.groupby(['col5','col2']).size().reset_index().groupby('col2')[[0]].max()
Out[56]:
0
col2
A 3
B 2
C 1
D 3

Group by two columns and count the occurrences of each combination in Pandas

Maybe this is what you want?

>>> data = pd.DataFrame({'user_id' : ['a1', 'a1', 'a1', 'a2','a2','a2','a3','a3','a3'], 'product_id' : ['p1','p1','p2','p1','p1','p1','p2','p2','p3']})
>>> count_series = data.groupby(['user_id', 'product_id']).size()
>>> count_series
user_id product_id
a1 p1 2
p2 1
a2 p1 3
a3 p2 2
p3 1
dtype: int64
>>> new_df = count_series.to_frame(name = 'size').reset_index()
>>> new_df
user_id product_id size
0 a1 p1 2
1 a1 p2 1
2 a2 p1 3
3 a3 p2 2
4 a3 p3 1
>>> new_df['size']
0 2
1 1
2 3
3 2
4 1
Name: size, dtype: int64

Groupby two columns, sum, count and display output values in separate column (pandas)

You can use .GroupBy.transform() to set the values for columns pwr and count. Then .set_index() on the 4 columns except type to get a layout similar to the desired output:

df['pwr'] = df.groupby(['id', 'date'])['pwr'].transform('sum')
df['count'] = df.groupby(['id', 'date'])['pwr'].transform('count')

df.set_index(['id', 'date', 'pwr', 'count'])

Output:

                    type
id date pwr count
aa q321 11 2 hey
2 hello
q425 40 2 hi
2 no
bb q122 3 2 ok
2 cool
q422 15 3 sure
3 sure
3 ok

Pandas groupby two columns and count shared values in third

If I am understanding you correctly, I think you want to group by col3 instead of col2:

df = pd.read_html('https://stackoverflow.com/q/69419264/14277722')[0]

df = df.groupby(['col1','col3'])['col2'].apply(list).reset_index()
df['count'] = df['col2'].apply(len)

You can then remove rows where col2 is a subset of another row with the following:

arr = pd.get_dummies(df['col2'].explode()).max(level=0).to_numpy()
subsets = np.matmul(arr, arr.T)
np.fill_diagonal(subsets, 0)
mask = ~np.equal(subsets, np.sum(arr, 1)).any(0)

df = df[mask]
   col1 col3             col2  count
0 A 12 [ID1, ID2, ID4] 3
3 A 18 [ID3] 1

Percentage of Total with Groupby for two columns

You can chain groupby:

pct = lambda x: 100 * x / x.sum()

out = df.groupby(['Product', 'Type']).sum().groupby('Product').apply(pct)
print(out)

# Output
Sales Qty
Product Type
AA AC 37.500000 47.058824
AD 62.500000 52.941176
BB BC 36.363636 68.750000
BD 63.636364 31.250000

Python Pandas: GROUPBY AND COUNT OF VALUES OF DIFFERENT COLUMNS in minimal steps and in a very fast way

Easy solution

Let us use crosstabs to calculate frequency tables then concat the tables along columns axis:

s1 = pd.crosstab(df['CONTINENT'], df['AGE_GROUP'])
s2 = pd.crosstab(df['CONTINENT'], df['APPROVAL_STATUS'])

pd.concat([s1, s2, s2.sum(1).rename('USER_COUNT')], axis=1)


           18-20  21-25  26-30  31-35  36-40  41-45  46-50  Above 50  NO  YES  not_confirmed  USER_COUNT
CONTINENT
AMERICA 1 1 1 4 0 0 0 1 3 3 2 8
ASIA 0 0 7 0 3 0 3 0 2 8 3 13
EUROPE 1 1 0 1 1 4 0 1 6 1 2 9

Pandas dataframe grouping by two columns, count and sum

You can try with pd.get_dummies, join and groupby+sum:

pd.get_dummies(df['A or B'])\
.join(df.drop('A or B',1))\
.groupby('Name',as_index=False).sum()

Output:

  Name  A  B  Sales ($)
0 Ben 2 1 17
1 Sam 1 2 18

Details:

First, use get_dummies to get categorical variable into dummy/indicator variables:

pd.get_dummies(df['A or B'])
# A B
#0 1 0
#1 1 0
#2 0 1
#3 0 1
#4 1 0
#5 0 1

Then use join, to concat the dummies with original df with 'A or B' column dropped:

pd.get_dummies(df['A or B']).join(df.drop('A or B',1))
# A B Name Sales ($)
#0 1 0 Ben 10
#1 1 0 Ben 5
#2 0 1 Ben 2
#3 0 1 Sam 5
#4 1 0 Sam 6
#5 0 1 Sam 7

And finally, do the groupby+sum based on name:

pd.get_dummies(df['A or B']).join(df.drop('A or B',1)).groupby('Name',as_index=False).sum()
# Name A B Sales ($)
#0 Ben 2 1 17
#1 Sam 1 2 18

In pandas, how do you groupby two columns and sum a third distinct column?

I think you want to group by B,C:

df.groupby(['B','C']).agg({'C':'count', 'A':'sum'})

Output:

       C         A
B C
-1 1 1 0.000000
3 1 -0.007147
5 2 -0.024842
7 5 -0.124889
9 2 -0.015342
11 1 -0.004549
1 2 1 0.021954
4 3 0.131472
6 1 0.025873
8 1 0.011601
10 1 0.006538
12 1 0.002102

Or better yet with named agg, which allows you to rename the new columns:

df.groupby(['B','C']).agg(C_count=('C','count'),
A_sum=('A','sum'))

Output:

       C_count     A_sum
B C
-1 1 1 0.000000
3 1 -0.007147
5 2 -0.024842
7 5 -0.124889
9 2 -0.015342
11 1 -0.004549
1 2 1 0.021954
4 3 0.131472
6 1 0.025873
8 1 0.011601
10 1 0.006538
12 1 0.002102


Related Topics



Leave a reply



Submit