Pandas DataFrame Groupby two columns and get counts
Followed by @Andy's answer, you can do following to solve your second question:
In [56]: df.groupby(['col5','col2']).size().reset_index().groupby('col2')[[0]].max()
Out[56]:
0
col2
A 3
B 2
C 1
D 3
Group by two columns and count the occurrences of each combination in Pandas
Maybe this is what you want?
>>> data = pd.DataFrame({'user_id' : ['a1', 'a1', 'a1', 'a2','a2','a2','a3','a3','a3'], 'product_id' : ['p1','p1','p2','p1','p1','p1','p2','p2','p3']})
>>> count_series = data.groupby(['user_id', 'product_id']).size()
>>> count_series
user_id product_id
a1 p1 2
p2 1
a2 p1 3
a3 p2 2
p3 1
dtype: int64
>>> new_df = count_series.to_frame(name = 'size').reset_index()
>>> new_df
user_id product_id size
0 a1 p1 2
1 a1 p2 1
2 a2 p1 3
3 a3 p2 2
4 a3 p3 1
>>> new_df['size']
0 2
1 1
2 3
3 2
4 1
Name: size, dtype: int64
Groupby two columns, sum, count and display output values in separate column (pandas)
You can use .GroupBy.transform()
to set the values for columns pwr
and count
. Then .set_index()
on the 4 columns except type
to get a layout similar to the desired output:
df['pwr'] = df.groupby(['id', 'date'])['pwr'].transform('sum')
df['count'] = df.groupby(['id', 'date'])['pwr'].transform('count')
df.set_index(['id', 'date', 'pwr', 'count'])
Output:
type
id date pwr count
aa q321 11 2 hey
2 hello
q425 40 2 hi
2 no
bb q122 3 2 ok
2 cool
q422 15 3 sure
3 sure
3 ok
Pandas groupby two columns and count shared values in third
If I am understanding you correctly, I think you want to group by col3
instead of col2
:
df = pd.read_html('https://stackoverflow.com/q/69419264/14277722')[0]
df = df.groupby(['col1','col3'])['col2'].apply(list).reset_index()
df['count'] = df['col2'].apply(len)
You can then remove rows where col2
is a subset of another row with the following:
arr = pd.get_dummies(df['col2'].explode()).max(level=0).to_numpy()
subsets = np.matmul(arr, arr.T)
np.fill_diagonal(subsets, 0)
mask = ~np.equal(subsets, np.sum(arr, 1)).any(0)
df = df[mask]
col1 col3 col2 count
0 A 12 [ID1, ID2, ID4] 3
3 A 18 [ID3] 1
Percentage of Total with Groupby for two columns
You can chain groupby
:
pct = lambda x: 100 * x / x.sum()
out = df.groupby(['Product', 'Type']).sum().groupby('Product').apply(pct)
print(out)
# Output
Sales Qty
Product Type
AA AC 37.500000 47.058824
AD 62.500000 52.941176
BB BC 36.363636 68.750000
BD 63.636364 31.250000
Python Pandas: GROUPBY AND COUNT OF VALUES OF DIFFERENT COLUMNS in minimal steps and in a very fast way
Easy solution
Let us use crosstabs
to calculate frequency tables then concat
the tables along columns axis:
s1 = pd.crosstab(df['CONTINENT'], df['AGE_GROUP'])
s2 = pd.crosstab(df['CONTINENT'], df['APPROVAL_STATUS'])
pd.concat([s1, s2, s2.sum(1).rename('USER_COUNT')], axis=1)
18-20 21-25 26-30 31-35 36-40 41-45 46-50 Above 50 NO YES not_confirmed USER_COUNT
CONTINENT
AMERICA 1 1 1 4 0 0 0 1 3 3 2 8
ASIA 0 0 7 0 3 0 3 0 2 8 3 13
EUROPE 1 1 0 1 1 4 0 1 6 1 2 9
Pandas dataframe grouping by two columns, count and sum
You can try with pd.get_dummies
, join
and groupby
+sum
:
pd.get_dummies(df['A or B'])\
.join(df.drop('A or B',1))\
.groupby('Name',as_index=False).sum()
Output:
Name A B Sales ($)
0 Ben 2 1 17
1 Sam 1 2 18
Details:
First, use get_dummies
to get categorical variable into dummy/indicator variables:
pd.get_dummies(df['A or B'])
# A B
#0 1 0
#1 1 0
#2 0 1
#3 0 1
#4 1 0
#5 0 1
Then use join, to concat the dummies with original df with 'A or B'
column dropped:
pd.get_dummies(df['A or B']).join(df.drop('A or B',1))
# A B Name Sales ($)
#0 1 0 Ben 10
#1 1 0 Ben 5
#2 0 1 Ben 2
#3 0 1 Sam 5
#4 1 0 Sam 6
#5 0 1 Sam 7
And finally, do the groupby
+sum
based on name:
pd.get_dummies(df['A or B']).join(df.drop('A or B',1)).groupby('Name',as_index=False).sum()
# Name A B Sales ($)
#0 Ben 2 1 17
#1 Sam 1 2 18
In pandas, how do you groupby two columns and sum a third distinct column?
I think you want to group by B,C
:
df.groupby(['B','C']).agg({'C':'count', 'A':'sum'})
Output:
C A
B C
-1 1 1 0.000000
3 1 -0.007147
5 2 -0.024842
7 5 -0.124889
9 2 -0.015342
11 1 -0.004549
1 2 1 0.021954
4 3 0.131472
6 1 0.025873
8 1 0.011601
10 1 0.006538
12 1 0.002102
Or better yet with named agg, which allows you to rename the new columns:
df.groupby(['B','C']).agg(C_count=('C','count'),
A_sum=('A','sum'))
Output:
C_count A_sum
B C
-1 1 1 0.000000
3 1 -0.007147
5 2 -0.024842
7 5 -0.124889
9 2 -0.015342
11 1 -0.004549
1 2 1 0.021954
4 3 0.131472
6 1 0.025873
8 1 0.011601
10 1 0.006538
12 1 0.002102
Related Topics
Is Shared Readonly Data Copied to Different Processes for Multiprocessing
In Python, How to Read the Exif Data for an Image
In Python, How to Convert a 'Datetime' Object to Seconds
How to Get Href Links from HTML Using Python
How to Load All Entries in an Infinite Scroll at Once to Parse the HTML in Python
How to Find Children of Nodes Using Beautifulsoup
Does Anybody Know How to Identify Shadow Dom Web Elements Using Selenium Webdriver
Convert HTML Entities to Unicode and Vice Versa
Python Multiprocessing Linux Windows Difference
How to Automatically Install Required Packages from a Python Script as Necessary
Negative Integer Division Surprising Result
Running Selenium Webdriver with a Proxy in Python
How to Get the Path of the Python Script I am Running In