Group by two columns and count the occurrences of each combination in Pandas
Maybe this is what you want?
>>> data = pd.DataFrame({'user_id' : ['a1', 'a1', 'a1', 'a2','a2','a2','a3','a3','a3'], 'product_id' : ['p1','p1','p2','p1','p1','p1','p2','p2','p3']})
>>> count_series = data.groupby(['user_id', 'product_id']).size()
>>> count_series
user_id product_id
a1 p1 2
p2 1
a2 p1 3
a3 p2 2
p3 1
dtype: int64
>>> new_df = count_series.to_frame(name = 'size').reset_index()
>>> new_df
user_id product_id size
0 a1 p1 2
1 a1 p2 1
2 a2 p1 3
3 a3 p2 2
4 a3 p3 1
>>> new_df['size']
0 2
1 1
2 3
3 2
4 1
Name: size, dtype: int64
Counting occurrence of values after using groupby on multiple pandas columns
You can use double .groupby
. For example:
df["freq"] = df.groupby("a")["b"].apply(lambda x: x.groupby(x).ngroup() + 1)
print(df)
Prints:
a b c freq
0 a 12.0 12.0 1
1 a 12.0 33.0 1
2 b 12.3 12.3 1
3 a 13.0 1.0 2
Pandas groupby Id and count occurrences of picklist/unique values
df.groupby(['company id', 'funding_type']).size().unstack(fill_value=0)
Counting the number of occurrences per year in a groupby
You can use diff
and groupby
:
df.count_to_today.diff().ne(0).groupby([df.id, df.year]).sum()
id year
1234 2017 2.0
2018 2.0
Name: count_to_today, dtype: float64
(df.count_to_today.diff()
.ne(0)
.groupby([df.id, df.year])
.sum()
.astype(int)
.reset_index())
id year count_to_today
0 1234 2017 2
1 1234 2018 2
Perform group by on a column to calculate count of occurrences of another column in R
Here are couple ways to do this in dplyr
-
library(dplyr)
#1.
df %>% filter(Response_days>5) %>% count(Name, name = 'Count')
#2.
df %>% group_by(Name) %>% summarise(count = sum(Response_days > 5))
and in base R :
#1.
aggregate(Response_days~Name, subset(df, Response_days>5), length)
#2.
aggregate(Response_days~Name, df, function(x) sum(x > 5))
pandas groupby count string occurrence over column
Call apply
on the 'scores' column on the groupby
object and use the vectorise str
method contains
, use this to filter the group
and call count
:
In [34]:
df.groupby(['catA', 'catB'])['scores'].apply(lambda x: x[x.str.contains('RET')].count())
Out[34]:
catA catB
A X 1
Y 1
B Z 2
Name: scores, dtype: int64
To assign as a column use transform
so that the aggregation returns a series with it's index aligned to the original df:
In [35]:
df['count'] = df.groupby(['catA', 'catB'])['scores'].transform(lambda x: x[x.str.contains('RET')].count())
df
Out[35]:
catA catB scores count
0 A X 6-4 RET 1
1 A X 6-4 6-4 1
2 A Y 6-3 RET 1
3 B Z 6-0 RET 2
4 B Z 6-1 RET 2
How can I count occurrences with groupBy?
I think you're just looking for the overload which takes another Collector
to specify what to do with each group... and then Collectors.counting()
to do the counting:
import java.util.*;
import java.util.stream.*;
class Test {
public static void main(String[] args) {
List<String> list = new ArrayList<>();
list.add("Hello");
list.add("Hello");
list.add("World");
Map<String, Long> counted = list.stream()
.collect(Collectors.groupingBy(Function.identity(), Collectors.counting()));
System.out.println(counted);
}
}
Result:
{Hello=2, World=1}
(There's also the possibility of using groupingByConcurrent
for more efficiency. Something to bear in mind for your real code, if it would be safe in your context.)
groupby, count past occurences of events, and show the most recent event
Just some basic reshaping and crosstab
.
The idea is to filter your dataframe by values that aren't the max, do a values count aggregation and re-join your dataframe with the max dates.
max_date = df.groupby('ID')['Date'].max()
s1 = df.loc[~df.index.isin(df.groupby("ID")["Date"].idxmax())]
df1 = pd.crosstab(s1.ID, s1.Class).join(max_date).rename(
columns={"Bad": "Past_deliq", "Good": "Past_non_deliq"}
)
Past_deliq Past_non_deliq Date
ID
112 0 1 2019-01-20
113 1 1 2020-02-03
Group by and take count of first occurrence Pandas Dataframe
first filter only first values by GroupBy.transform
and GroupBy.first
comparing by original values in boolean indexing
and then count values by GroupBy.size
:
df = df[df.groupby('Number')['Character'].transform('first').eq(df['Character'])]
df = df.groupby(['Number','Character']).size().reset_index(name='count')
print (df)
Number Character count
0 111 a 2
1 222 b 2
2 333 c 3
Efficiently count occurrences of a value with groupby and within a date range
Annotated code
# Merge the dataframes on sid and modtype
keys = ['sid', 'modtype']
s = df2.merge(df1[[*keys, 'date']], on=keys, suffixes=['', '_'])
# Create boolean condtitions as per requirements
s['cnt_3day_after'] = s['date'].between(s['date_'], s['date_'] + pd.DateOffset(days=3), inclusive='right')
s['cnt_3day_before'] = s['date'].between(s['date_'] - pd.DateOffset(days=3), s['date_'], inclusive='left' )
# group the boolean conditions by sid and modtype
# and aggregate with sum to count the number of True values
s = s.groupby(keys)[['cnt_3day_after', 'cnt_3day_before']].sum()
# Join the aggregated counts back with df1
df_out = df1.join(s, on=keys)
Result
print(df_out)
sid servid date modtype service cnt_3day_after cnt_3day_before
0 123 881 2022-07-05 A1 z 1 1
1 456 879 2022-07-02 A2 z 0 2
Related Topics
Why Is "Extends T" Allowed But Not "Implements T"
Simpledateformat and Locale Based Format String
What Is the Meaning of the Cascadetype.All for a @Manytoone JPA Association
Does Assigning Objects to Null in Java Impact Garbage Collection
Only One Component Shows Up in Jframe
How to Ask the Selenium-Webdriver to Wait for Few Seconds in Java
Garbage Collection of String Literals
Using the "Final" Modifier Whenever Applicable in Java
How to Create an .Exe for a Java Program
Spring 4 Loading Static Resources
How to Do the Equivalent of Pass by Reference for Primitives in Java
Differencebetween Join and Join Fetch When Using JPA and Hibernate
Get Query from Java.Sql.Preparedstatement
Get Source Jars from Maven Repository