Most Frequent Value (Mode) by Group

pandas groupby and find most frequent value (mode)

You can calculate both count and max on dates, then sort on these values and drop duplicates (or use groupby().head()):

s = df.groupby(['user_id','product_id'])['created_at'].agg(['count','max'])
s.sort_values(['count','max'], ascending=False).groupby('user_id').head(1)

Output:

                    count                  max
user_id product_id
3 400 2 2021-04-21 10:20:00
1 200 2 2020-06-24 10:10:24
2 300 1 2021-01-21 10:20:00

Most common value (mode) by group in R

You can do it like this:

library(dplyr)

df %>%
count(a, b, c) %>%
group_by(a, c) %>%
filter(n == max(n)) %>%
select(a, b, c)

Solution:

# A tibble: 8 x 3
# Groups: a, c [6]
a b c
<fct> <dbl> <fct>
1 a 2 Feb
2 a 1 Feb
3 a 2 Jan
4 a 3 Mar
5 b 3 Mar
6 b 1 Jan
7 b 2 Feb
8 b 3 Feb

Most frequent value (mode) by group

Building on Davids comments your solution is the following:

Mode <- function(x) {
ux <- unique(x)
ux[which.max(tabulate(match(x, ux)))]
}

library(dplyr)
df %>% group_by(a) %>% mutate(c=Mode(b))

Notice though that for the tie when df$a is 3 then the mode for b is 1.

Find the most frequent value per group in a table column

Updated: Fiddle

This should address the specific "which object per ethnicity" question.

Note, this doesn't address ties in the count. That wasn't part of the question / request.

Adjust your SQL to include this logic, to provide that detail:

WITH cte AS (
SELECT officer_defined_ethnicity
, object_of_search
, COUNT(*) AS n
, ROW_NUMBER() OVER (PARTITION BY officer_defined_ethnicity ORDER BY COUNT(*) DESC) AS rn
FROM stopAndSearches
GROUP BY officer_defined_ethnicity, object_of_search
)
SELECT * FROM cte
WHERE rn = 1
;

Result:































officer_defined_ethnicityobject_of_searchnrn
ethnicity1Cat11
ethnicity2Stolen goods21
ethnicity3Fireworks11

GroupBy pandas DataFrame and select most common value

You can use value_counts() to get a count series, and get the first row:

import pandas as pd

source = pd.DataFrame({'Country' : ['USA', 'USA', 'Russia','USA'],
'City' : ['New-York', 'New-York', 'Sankt-Petersburg', 'New-York'],
'Short name' : ['NY','New','Spb','NY']})

source.groupby(['Country','City']).agg(lambda x:x.value_counts().index[0])

In case you are wondering about performing other agg functions in the .agg()
try this.

# Let's add a new col,  account
source['account'] = [1,2,3,3]

source.groupby(['Country','City']).agg(mod = ('Short name', \
lambda x: x.value_counts().index[0]),
avg = ('account', 'mean') \
)

get the most frequent group of values with pandas in python

You can use pd.cut, groupby(), count() like below:

>>> df = pd.DataFrame({
'freq': [306.0416667, 286.1666667, 207.5 , 226.4166667 , 304.2083333 ,
336.1666667 , 255.5416667, 224.5833333 , 190.1666667, 163.5 ,
231.125, 167.3333333 , 193.5416667 , 165 , 154.875 , 303.4166667]})

>>> ranges = [0,90,180,270, 360]
>>> df.groupby(pd.cut(df['freq'], ranges)).count()

freq
freq
(0, 90] 0
(90, 180] 4
(180, 270] 7
(270, 360] 5

>>> df.groupby(pd.cut(df['freq'], ranges)).count().idxmax()
freq (180, 270]
dtype: interval

Fill missing values by group using most frequent value


Running the code above will prompt an IndexError: single positional indexer is out-of-bounds

This is because transform gets to be passed each column as a series and at some point it will see the value column on its own; and if you do:

df1[df1.group == "B"].value.mode()

you get

Series([], dtype: float64)

hence the index-out-of-bounds like error as it is empty and iloc[0] doesn't exist.

OTOH, when you do:

df1[df1.group == "B"].mode()

mode is calculated on a dataframe not a series and pandas decides to give a NaN on the full-NaN column i.e. value column here.

So one remedy is to use apply instead of transform to pass a dataframe instead of individual series to your lambda:

df1.groupby("group").apply(lambda x: x.fillna(x.mode().iloc[0])).reset_index(drop=True)

to get

  group  value
0 A 1.0
1 A 1.0
2 A 1.0
3 A 1.0
4 B NaN
5 B NaN
6 B NaN

How to choose the most common value in a group related to other group in R?

Another dplyr strategy using count and slice:

library(dplyr)
DATA %>%
group_by(ID) %>%
count(VAR, CATEGORY) %>%
slice(which.max(n)) %>%
select(-n)
     ID VAR   CATEGORY
<dbl> <chr> <chr>
1 1 A ANE
2 2 C BOA
3 3 E CAT
4 4 F DOG


Related Topics



Leave a reply



Submit