How to Count Occurrences with Groupby

Group by two columns and count the occurrences of each combination in Pandas

Maybe this is what you want?

>>> data = pd.DataFrame({'user_id' : ['a1', 'a1', 'a1', 'a2','a2','a2','a3','a3','a3'], 'product_id' : ['p1','p1','p2','p1','p1','p1','p2','p2','p3']})
>>> count_series = data.groupby(['user_id', 'product_id']).size()
>>> count_series
user_id  product_id
a1       p1            2
         p2            1
a2       p1            3
a3       p2            2
         p3            1
dtype: int64
>>> new_df = count_series.to_frame(name = 'size').reset_index()
>>> new_df
  user_id product_id  size
0      a1         p1     2
1      a1         p2     1
2      a2         p1     3
3      a3         p2     2
4      a3         p3     1
>>> new_df['size']
0    2
1    1
2    3
3    2
4    1
Name: size, dtype: int64

Counting occurrence of values after using groupby on multiple pandas columns

You can use double .groupby. For example:

df["freq"] = df.groupby("a")["b"].apply(lambda x: x.groupby(x).ngroup() + 1)
print(df)

Prints:

   a     b     c  freq
0  a  12.0  12.0     1
1  a  12.0  33.0     1
2  b  12.3  12.3     1
3  a  13.0   1.0     2

Pandas groupby Id and count occurrences of picklist/unique values

df.groupby(['company id', 'funding_type']).size().unstack(fill_value=0)

Counting the number of occurrences per year in a groupby

You can use diff and groupby:

df.count_to_today.diff().ne(0).groupby([df.id, df.year]).sum()

id    year
1234  2017    2.0
      2018    2.0
Name: count_to_today, dtype: float64

(df.count_to_today.diff()
   .ne(0)
   .groupby([df.id, df.year])
   .sum()
   .astype(int)
   .reset_index())

     id  year  count_to_today
0  1234  2017               2
1  1234  2018               2

Perform group by on a column to calculate count of occurrences of another column in R

Here are couple ways to do this in dplyr -

library(dplyr)
#1.
df %>% filter(Response_days>5) %>% count(Name, name = 'Count')

#2.
df %>% group_by(Name) %>% summarise(count = sum(Response_days > 5))

and in base R :

#1.
aggregate(Response_days~Name, subset(df, Response_days>5), length)

#2.
aggregate(Response_days~Name, df, function(x) sum(x > 5))

pandas groupby count string occurrence over column

Call apply on the 'scores' column on the groupby object and use the vectorise str method contains, use this to filter the group and call count:

In [34]:    
df.groupby(['catA', 'catB'])['scores'].apply(lambda x: x[x.str.contains('RET')].count())

Out[34]:
catA  catB
A     X       1
      Y       1
B     Z       2
Name: scores, dtype: int64

To assign as a column use transform so that the aggregation returns a series with it's index aligned to the original df:

In [35]:
df['count'] = df.groupby(['catA', 'catB'])['scores'].transform(lambda x: x[x.str.contains('RET')].count())
df

Out[35]:
  catA catB   scores count
0    A    X  6-4 RET     1
1    A    X  6-4 6-4     1
2    A    Y  6-3 RET     1
3    B    Z  6-0 RET     2
4    B    Z  6-1 RET     2

How can I count occurrences with groupBy?

I think you're just looking for the overload which takes another Collector to specify what to do with each group... and then Collectors.counting() to do the counting:

import java.util.*;
import java.util.stream.*;

class Test {
    public static void main(String[] args) {
        List<String> list = new ArrayList<>();

        list.add("Hello");
        list.add("Hello");
        list.add("World");

        Map<String, Long> counted = list.stream()
            .collect(Collectors.groupingBy(Function.identity(), Collectors.counting()));

        System.out.println(counted);
    }
}

Result:

{Hello=2, World=1}

(There's also the possibility of using groupingByConcurrent for more efficiency. Something to bear in mind for your real code, if it would be safe in your context.)

groupby, count past occurences of events, and show the most recent event

Just some basic reshaping and crosstab.

The idea is to filter your dataframe by values that aren't the max, do a values count aggregation and re-join your dataframe with the max dates.

max_date = df.groupby('ID')['Date'].max()
s1 = df.loc[~df.index.isin(df.groupby("ID")["Date"].idxmax())]

df1 = pd.crosstab(s1.ID, s1.Class).join(max_date).rename(
    columns={"Bad": "Past_deliq", "Good": "Past_non_deliq"}
)

     Past_deliq  Past_non_deliq       Date
ID                                        
112           0               1 2019-01-20
113           1               1 2020-02-03

Group by and take count of first occurrence Pandas Dataframe

first filter only first values by GroupBy.transform and GroupBy.first comparing by original values in boolean indexing and then count values by GroupBy.size:

df = df[df.groupby('Number')['Character'].transform('first').eq(df['Character'])]

df = df.groupby(['Number','Character']).size().reset_index(name='count')
print (df)    
   Number Character  count
0     111         a      2
1     222         b      2
2     333         c      3

Efficiently count occurrences of a value with groupby and within a date range

Annotated code

# Merge the dataframes on sid and modtype
keys = ['sid', 'modtype']
s = df2.merge(df1[[*keys, 'date']], on=keys, suffixes=['', '_'])

# Create boolean condtitions as per requirements
s['cnt_3day_after']  = s['date'].between(s['date_'], s['date_'] + pd.DateOffset(days=3), inclusive='right')
s['cnt_3day_before'] = s['date'].between(s['date_'] - pd.DateOffset(days=3), s['date_'], inclusive='left' )

# group the boolean conditions by sid and modtype
# and aggregate with sum to count the number of True values
s = s.groupby(keys)[['cnt_3day_after', 'cnt_3day_before']].sum()

# Join the aggregated counts back with df1
df_out = df1.join(s, on=keys)

Result

print(df_out)

   sid servid       date modtype service  cnt_3day_after  cnt_3day_before
0  123    881 2022-07-05      A1       z               1                1
1  456    879 2022-07-02      A2       z               0                2