How to Count Occurrences with Groupby

Group by two columns and count the occurrences of each combination in Pandas

Maybe this is what you want?

>>> data = pd.DataFrame({'user_id' : ['a1', 'a1', 'a1', 'a2','a2','a2','a3','a3','a3'], 'product_id' : ['p1','p1','p2','p1','p1','p1','p2','p2','p3']})
>>> count_series = data.groupby(['user_id', 'product_id']).size()
>>> count_series
user_id product_id
a1 p1 2
p2 1
a2 p1 3
a3 p2 2
p3 1
dtype: int64
>>> new_df = count_series.to_frame(name = 'size').reset_index()
>>> new_df
user_id product_id size
0 a1 p1 2
1 a1 p2 1
2 a2 p1 3
3 a3 p2 2
4 a3 p3 1
>>> new_df['size']
0 2
1 1
2 3
3 2
4 1
Name: size, dtype: int64

Counting occurrence of values after using groupby on multiple pandas columns

You can use double .groupby. For example:

df["freq"] = df.groupby("a")["b"].apply(lambda x: x.groupby(x).ngroup() + 1)
print(df)

Prints:

   a     b     c  freq
0 a 12.0 12.0 1
1 a 12.0 33.0 1
2 b 12.3 12.3 1
3 a 13.0 1.0 2

Pandas groupby Id and count occurrences of picklist/unique values

df.groupby(['company id', 'funding_type']).size().unstack(fill_value=0)

Counting the number of occurrences per year in a groupby

You can use diff and groupby:

df.count_to_today.diff().ne(0).groupby([df.id, df.year]).sum()

id year
1234 2017 2.0
2018 2.0
Name: count_to_today, dtype: float64

(df.count_to_today.diff()
.ne(0)
.groupby([df.id, df.year])
.sum()
.astype(int)
.reset_index())

id year count_to_today
0 1234 2017 2
1 1234 2018 2

Perform group by on a column to calculate count of occurrences of another column in R

Here are couple ways to do this in dplyr -

library(dplyr)
#1.
df %>% filter(Response_days>5) %>% count(Name, name = 'Count')

#2.
df %>% group_by(Name) %>% summarise(count = sum(Response_days > 5))

and in base R :

#1.
aggregate(Response_days~Name, subset(df, Response_days>5), length)

#2.
aggregate(Response_days~Name, df, function(x) sum(x > 5))

pandas groupby count string occurrence over column

Call apply on the 'scores' column on the groupby object and use the vectorise str method contains, use this to filter the group and call count:

In [34]:    
df.groupby(['catA', 'catB'])['scores'].apply(lambda x: x[x.str.contains('RET')].count())

Out[34]:
catA catB
A X 1
Y 1
B Z 2
Name: scores, dtype: int64

To assign as a column use transform so that the aggregation returns a series with it's index aligned to the original df:

In [35]:
df['count'] = df.groupby(['catA', 'catB'])['scores'].transform(lambda x: x[x.str.contains('RET')].count())
df

Out[35]:
catA catB scores count
0 A X 6-4 RET 1
1 A X 6-4 6-4 1
2 A Y 6-3 RET 1
3 B Z 6-0 RET 2
4 B Z 6-1 RET 2

How can I count occurrences with groupBy?

I think you're just looking for the overload which takes another Collector to specify what to do with each group... and then Collectors.counting() to do the counting:

import java.util.*;
import java.util.stream.*;

class Test {
public static void main(String[] args) {
List<String> list = new ArrayList<>();

list.add("Hello");
list.add("Hello");
list.add("World");

Map<String, Long> counted = list.stream()
.collect(Collectors.groupingBy(Function.identity(), Collectors.counting()));

System.out.println(counted);
}
}

Result:

{Hello=2, World=1}

(There's also the possibility of using groupingByConcurrent for more efficiency. Something to bear in mind for your real code, if it would be safe in your context.)

groupby, count past occurences of events, and show the most recent event

Just some basic reshaping and crosstab.

The idea is to filter your dataframe by values that aren't the max, do a values count aggregation and re-join your dataframe with the max dates.

max_date = df.groupby('ID')['Date'].max()
s1 = df.loc[~df.index.isin(df.groupby("ID")["Date"].idxmax())]

df1 = pd.crosstab(s1.ID, s1.Class).join(max_date).rename(
columns={"Bad": "Past_deliq", "Good": "Past_non_deliq"}
)

Past_deliq Past_non_deliq Date
ID
112 0 1 2019-01-20
113 1 1 2020-02-03

Group by and take count of first occurrence Pandas Dataframe

first filter only first values by GroupBy.transform and GroupBy.first comparing by original values in boolean indexing and then count values by GroupBy.size:

df = df[df.groupby('Number')['Character'].transform('first').eq(df['Character'])]

df = df.groupby(['Number','Character']).size().reset_index(name='count')
print (df)
Number Character count
0 111 a 2
1 222 b 2
2 333 c 3

Efficiently count occurrences of a value with groupby and within a date range

Annotated code

# Merge the dataframes on sid and modtype
keys = ['sid', 'modtype']
s = df2.merge(df1[[*keys, 'date']], on=keys, suffixes=['', '_'])

# Create boolean condtitions as per requirements
s['cnt_3day_after'] = s['date'].between(s['date_'], s['date_'] + pd.DateOffset(days=3), inclusive='right')
s['cnt_3day_before'] = s['date'].between(s['date_'] - pd.DateOffset(days=3), s['date_'], inclusive='left' )

# group the boolean conditions by sid and modtype
# and aggregate with sum to count the number of True values
s = s.groupby(keys)[['cnt_3day_after', 'cnt_3day_before']].sum()

# Join the aggregated counts back with df1
df_out = df1.join(s, on=keys)

Result

print(df_out)

sid servid date modtype service cnt_3day_after cnt_3day_before
0 123 881 2022-07-05 A1 z 1 1
1 456 879 2022-07-02 A2 z 0 2


Related Topics



Leave a reply



Submit