Use Groupby in Pandas to Count Things in One Column in Comparison to Another

Use groupby in Pandas to count things in one column in comparison to another

I'd do:

df.groupby('Event').Status.value_counts().unstack().fillna(0)

Or use the fill_value argument:

df.groupby('Event').Status.value_counts().unstack(fill_value=0)

Sample Image



Timing

Sample Image

Pandas, groupby and counting data in others columns

You can convert column to years by Series.dt.year and aggregate by GroupBy.agg with dictionary for columns with aggregation function, last add DataFrame.reindex if necessary same order of columns like in original DataFrame:

#if necessary convert to datetimes
df['CreationDate'] = pd.to_datetime(df['CreationDate'])

df1 = (df.groupby(df['CreationDate'].dt.year)
.agg({'Id':'first', 'Score':'sum', 'ViewCount':'sum'})
.reset_index()
.reindex(columns=df.columns)
)

print (df1)
Id CreationDate Score ViewCount
0 1 2011 109 16125
1 3 2012 36 1015
2 4 2018 33 7064

Pandas DataFrame Groupby two columns and get counts

Followed by @Andy's answer, you can do following to solve your second question:

In [56]: df.groupby(['col5','col2']).size().reset_index().groupby('col2')[[0]].max()
Out[56]:
0
col2
A 3
B 2
C 1
D 3

Pandas, groupby and count

You seem to want to group by several columns at once:

df.groupby(['revenue','session','user_id'])['user_id'].count()

should give you what you want

Group dataframe and get sum AND count?

try this:

In [110]: (df.groupby('Company Name')
.....: .agg({'Organisation Name':'count', 'Amount': 'sum'})
.....: .reset_index()
.....: .rename(columns={'Organisation Name':'Organisation Count'})
.....: )
Out[110]:
Company Name Amount Organisation Count
0 Vifor Pharma UK Ltd 4207.93 5

or if you don't want to reset index:

df.groupby('Company Name')['Amount'].agg(['sum','count'])

or

df.groupby('Company Name').agg({'Amount': ['sum','count']})

Demo:

In [98]: df.groupby('Company Name')['Amount'].agg(['sum','count'])
Out[98]:
sum count
Company Name
Vifor Pharma UK Ltd 4207.93 5

In [99]: df.groupby('Company Name').agg({'Amount': ['sum','count']})
Out[99]:
Amount
sum count
Company Name
Vifor Pharma UK Ltd 4207.93 5

Pandas groupby multiple columns to compare values

Assuming you want the count per Param, you can use:

out = df['Value'].ge(df['Limit']).groupby(df['Param']).sum()

output:

Param
A 1
B 2
C 1
D 0
E 0
dtype: int64

used input (with a duplicated row "B" for the example):

  Param  Value  Limit
0 A 1.5 1.0
1 B 2.5 1.0
2 B 2.5 1.0
3 C 2.0 2.0
4 D 2.0 2.5
5 E 1.5 2.0
as DataFrame
df['Value'].ge(df['Limit']).groupby(df['Param']).sum().reset_index(name='Count')

# or

df['Value'].ge(df['Limit']).groupby(df['Param']).agg(Count='sum').reset_index()

output:

  Param  Count
0 A 1
1 B 2
2 C 1
3 D 0
4 E 0

Python Pandas: GROUPBY AND COUNT OF VALUES OF DIFFERENT COLUMNS in minimal steps and in a very fast way

Easy solution

Let us use crosstabs to calculate frequency tables then concat the tables along columns axis:

s1 = pd.crosstab(df['CONTINENT'], df['AGE_GROUP'])
s2 = pd.crosstab(df['CONTINENT'], df['APPROVAL_STATUS'])

pd.concat([s1, s2, s2.sum(1).rename('USER_COUNT')], axis=1)


           18-20  21-25  26-30  31-35  36-40  41-45  46-50  Above 50  NO  YES  not_confirmed  USER_COUNT
CONTINENT
AMERICA 1 1 1 4 0 0 0 1 3 3 2 8
ASIA 0 0 7 0 3 0 3 0 2 8 3 13
EUROPE 1 1 0 1 1 4 0 1 6 1 2 9

Pandas groupby two columns and count shared values in third

If I am understanding you correctly, I think you want to group by col3 instead of col2:

df = pd.read_html('https://stackoverflow.com/q/69419264/14277722')[0]

df = df.groupby(['col1','col3'])['col2'].apply(list).reset_index()
df['count'] = df['col2'].apply(len)

You can then remove rows where col2 is a subset of another row with the following:

arr = pd.get_dummies(df['col2'].explode()).max(level=0).to_numpy()
subsets = np.matmul(arr, arr.T)
np.fill_diagonal(subsets, 0)
mask = ~np.equal(subsets, np.sum(arr, 1)).any(0)

df = df[mask]
   col1 col3             col2  count
0 A 12 [ID1, ID2, ID4] 3
3 A 18 [ID3] 1

Group by two columns and count the occurrences of each combination in Pandas

Maybe this is what you want?

>>> data = pd.DataFrame({'user_id' : ['a1', 'a1', 'a1', 'a2','a2','a2','a3','a3','a3'], 'product_id' : ['p1','p1','p2','p1','p1','p1','p2','p2','p3']})
>>> count_series = data.groupby(['user_id', 'product_id']).size()
>>> count_series
user_id product_id
a1 p1 2
p2 1
a2 p1 3
a3 p2 2
p3 1
dtype: int64
>>> new_df = count_series.to_frame(name = 'size').reset_index()
>>> new_df
user_id product_id size
0 a1 p1 2
1 a1 p2 1
2 a2 p1 3
3 a3 p2 2
4 a3 p3 1
>>> new_df['size']
0 2
1 1
2 3
3 2
4 1
Name: size, dtype: int64


Related Topics



Leave a reply



Submit