pandas groupby and find most frequent value (mode)
You can calculate both count
and max
on dates, then sort on these values and drop duplicates (or use groupby().head()):
s = df.groupby(['user_id','product_id'])['created_at'].agg(['count','max'])
s.sort_values(['count','max'], ascending=False).groupby('user_id').head(1)
Output:
count max
user_id product_id
3 400 2 2021-04-21 10:20:00
1 200 2 2020-06-24 10:10:24
2 300 1 2021-01-21 10:20:00
Most common value (mode) by group in R
You can do it like this:
library(dplyr)
df %>%
count(a, b, c) %>%
group_by(a, c) %>%
filter(n == max(n)) %>%
select(a, b, c)
Solution:
# A tibble: 8 x 3
# Groups: a, c [6]
a b c
<fct> <dbl> <fct>
1 a 2 Feb
2 a 1 Feb
3 a 2 Jan
4 a 3 Mar
5 b 3 Mar
6 b 1 Jan
7 b 2 Feb
8 b 3 Feb
Most frequent value (mode) by group
Building on Davids comments your solution is the following:
Mode <- function(x) {
ux <- unique(x)
ux[which.max(tabulate(match(x, ux)))]
}
library(dplyr)
df %>% group_by(a) %>% mutate(c=Mode(b))
Notice though that for the tie when df$a
is 3
then the mode for b
is 1
.
Find the most frequent value per group in a table column
Updated: Fiddle
This should address the specific "which object per ethnicity" question.
Note, this doesn't address ties in the count. That wasn't part of the question / request.
Adjust your SQL to include this logic, to provide that detail:
WITH cte AS (
SELECT officer_defined_ethnicity
, object_of_search
, COUNT(*) AS n
, ROW_NUMBER() OVER (PARTITION BY officer_defined_ethnicity ORDER BY COUNT(*) DESC) AS rn
FROM stopAndSearches
GROUP BY officer_defined_ethnicity, object_of_search
)
SELECT * FROM cte
WHERE rn = 1
;
Result:
officer_defined_ethnicity | object_of_search | n | rn |
---|---|---|---|
ethnicity1 | Cat | 1 | 1 |
ethnicity2 | Stolen goods | 2 | 1 |
ethnicity3 | Fireworks | 1 | 1 |
GroupBy pandas DataFrame and select most common value
You can use value_counts()
to get a count series, and get the first row:
import pandas as pd
source = pd.DataFrame({'Country' : ['USA', 'USA', 'Russia','USA'],
'City' : ['New-York', 'New-York', 'Sankt-Petersburg', 'New-York'],
'Short name' : ['NY','New','Spb','NY']})
source.groupby(['Country','City']).agg(lambda x:x.value_counts().index[0])
In case you are wondering about performing other agg functions in the .agg()
try this.
# Let's add a new col, account
source['account'] = [1,2,3,3]
source.groupby(['Country','City']).agg(mod = ('Short name', \
lambda x: x.value_counts().index[0]),
avg = ('account', 'mean') \
)
get the most frequent group of values with pandas in python
You can use pd.cut
, groupby()
, count()
like below:
>>> df = pd.DataFrame({
'freq': [306.0416667, 286.1666667, 207.5 , 226.4166667 , 304.2083333 ,
336.1666667 , 255.5416667, 224.5833333 , 190.1666667, 163.5 ,
231.125, 167.3333333 , 193.5416667 , 165 , 154.875 , 303.4166667]})
>>> ranges = [0,90,180,270, 360]
>>> df.groupby(pd.cut(df['freq'], ranges)).count()
freq
freq
(0, 90] 0
(90, 180] 4
(180, 270] 7
(270, 360] 5
>>> df.groupby(pd.cut(df['freq'], ranges)).count().idxmax()
freq (180, 270]
dtype: interval
Fill missing values by group using most frequent value
Running the code above will prompt an IndexError: single positional indexer is out-of-bounds
This is because transform
gets to be passed each column as a series and at some point it will see the value
column on its own; and if you do:
df1[df1.group == "B"].value.mode()
you get
Series([], dtype: float64)
hence the index-out-of-bounds like error as it is empty and iloc[0]
doesn't exist.
OTOH, when you do:
df1[df1.group == "B"].mode()
mode
is calculated on a dataframe not a series and pandas decides to give a NaN on the full-NaN column i.e. value
column here.
So one remedy is to use apply
instead of transform
to pass a dataframe instead of individual series to your lambda
:
df1.groupby("group").apply(lambda x: x.fillna(x.mode().iloc[0])).reset_index(drop=True)
to get
group value
0 A 1.0
1 A 1.0
2 A 1.0
3 A 1.0
4 B NaN
5 B NaN
6 B NaN
How to choose the most common value in a group related to other group in R?
Another dplyr
strategy using count
and slice
:
library(dplyr)
DATA %>%
group_by(ID) %>%
count(VAR, CATEGORY) %>%
slice(which.max(n)) %>%
select(-n)
ID VAR CATEGORY
<dbl> <chr> <chr>
1 1 A ANE
2 2 C BOA
3 3 E CAT
4 4 F DOG
Related Topics
Calculate Cumsum() While Ignoring Na Values
Pass Function Arguments to Both Dplyr and Ggplot
Why Is Using '<<-' Frowned Upon and How to Avoid It
Embedded Nul in String' Error When Importing CSV with Fread
Plot One Numeric Variable Against N Numeric Variables in N Plots
Filling Area Under Curve Based on Value
Display Weighted Mean by Group in the Data.Frame
R - How to Get Row & Column Subscripts of Matched Elements from a Distance Matrix
Finding Out Which Functions Are Called Within a Given Function
R Function with No Return Value
Sorting Each Row of a Data Frame
Different Colours of Geom_Line Above and Below a Specific Value
Splitting a String into New Rows in R
How to Delete Everything After Nth Delimiter in R
Error in Plot.Window(...):Need Finite 'Xlim' Values
How to Produce Different Geom_Vline in Different Facets in R