Groupby Pandas Dataframe and Select Most Common Value

GroupBy pandas DataFrame and select most common value

You can use value_counts() to get a count series, and get the first row:

import pandas as pd

source = pd.DataFrame({'Country' : ['USA', 'USA', 'Russia','USA'], 
                  'City' : ['New-York', 'New-York', 'Sankt-Petersburg', 'New-York'],
                  'Short name' : ['NY','New','Spb','NY']})

source.groupby(['Country','City']).agg(lambda x:x.value_counts().index[0])

In case you are wondering about performing other agg functions in the .agg()
try this.

# Let's add a new col,  account
source['account'] = [1,2,3,3]

source.groupby(['Country','City']).agg(mod  = ('Short name', \
                                        lambda x: x.value_counts().index[0]),
                                        avg = ('account', 'mean') \
                                      )

GroupBy pandas DataFrame and select most common value which is alphabetically first

Try with groupby and mode:

mapper = df.groupby("province")["city"].agg(lambda x: x.mode().sort_values()[0]).to_dict()
df["city"] = df["city"].where(df["city"].notnull(),
                              df["province"].map(mapper))

>>> df
   province       city
0         A    newyork
1         A     london
2         A    newyork
3         A     london
4         A     london
5         A     london
6         A    houston
7         B  hyderabad
8         B    karachi
9         B  hyderabad
10        B  hyderabad
11        B  hyderabad
12        B    beijing
13        B    karachi

Group by a column to find the most frequent value in another column?

Use SeriesGroupBy.value_counts and select first value of index:

df = df.groupby('col1')['col2'].apply(lambda x: x.value_counts().index[0]).reset_index()
print (df)
    col1 col2
0   blue   nb
1  green   gx

Or add DataFrame.drop_duplicates:

df = df.groupby('col1')['col2'].value_counts().reset_index(name='v')

df = df.drop_duplicates('col1')[['col1','col2']]
print (df)
    col1 col2
0   blue   nb
2  green   gx

Or use Series.mode and select first value by positions by Series.iat:

df = df.groupby('col1')['col2'].apply(lambda x: x.mode().iat[0]).reset_index()
print (df)
    col1 col2
0   blue   nb
1  green   gx

EDIT:

Problem is with only NaNs groups:

d = {'col1': ['green','green','green','blue','blue','blue'],
     'col2': [np.nan,np.nan,np.nan,'nb','nb','mj']}
df = pd.DataFrame(data=d)

f = lambda x: np.nan if x.isnull().all() else x.value_counts().index[0]
#or
#f = lambda x: next(iter(x.value_counts().index), np.nan)
#another solution
#f = lambda x: next(iter(x.mode()), np.nan)
df = df.groupby('col1')['col2'].apply(f).reset_index()
print (df)
    col1 col2
0   blue   nb
1  green  NaN

pandas groupby and find most frequent value (mode)

You can calculate both count and max on dates, then sort on these values and drop duplicates (or use groupby().head()):

s = df.groupby(['user_id','product_id'])['created_at'].agg(['count','max'])
s.sort_values(['count','max'], ascending=False).groupby('user_id').head(1)

Output:

                    count                  max
user_id product_id                            
3       400             2  2021-04-21 10:20:00
1       200             2  2020-06-24 10:10:24
2       300             1  2021-01-21 10:20:00

Python: select most frequent using group by

In the comments you note you're using pandas. You can do something like the following:

>>> df

           tag  category
0    automotive         8
1            ba         8
2        bamboo         8
3        bamboo         8
4        bamboo         8
5        bamboo         8
6        bamboo         8
7        bamboo        10
8        bamboo         8
9        bamboo         9
10       bamboo         8
11       bamboo        10
12       bamboo         8
13       bamboo         9
14       bamboo         8
15  banana tree         8
16  banana tree         8
17  banana tree         8
18  banana tree         8
19         bath         9

Do a groupby on 'tag' for the 'category' column and then within each group use the mode method. However, we have to make it a conditional because pandas doesn't return a number for the mode if the number of observations is less than 3 (we can just return the group itself in the special cases of 1 or 2 observations in a group). We can use the aggregate/agg method with a lambda function to do this:

>>> mode = lambda x: x.mode() if len(x) > 2 else np.array(x)
>>> df.groupby('tag')['category'].agg(mode)

tag
automotive     8
ba             8
bamboo         8
banana tree    8
bath           9

Note, when the mode is multi-modal you will get a array (numpy). For example, suppose there were two entries for bath (all the other data is the same):

tag|category
bath|9
bath|10

In that case the output would be:

>>> mode = lambda x: x.mode() if len(x) > 2 else np.array(x)
>>> df.groupby('tag')['category'].agg(mode)

tag
automotive           8
ba                   8
bamboo               8
banana tree          8
bath           [9, 10]

You can also use the value_counts method instead of mode. Once again, do a groupby on 'tag' for the 'category' column and then within each group use the value_counts method. value_counts arranges in descending order so you want to grab the index of the first row:

>>> df.groupby('tag')['category'].agg(lambda x: x.value_counts().index[0])

tag
automotive     8
ba             8
bamboo         8
banana tree    8
bath           9

However, this won't return an array in multi-modal situations. It will just return the first mode.

Pandas: return the occurrences of the most frequent value for each group (possibly without apply)

Use SeriesGroupBy.value_counts which sorting by default, so then add DataFrame.drop_duplicates for top values after Series.reset_index:

df = (df_test.groupby('A')['B']
             .value_counts()
             .rename_axis(['A','most_freq'])
             .reset_index(name='freq')
             .drop_duplicates('A'))
print (df)
   A  most_freq  freq
0  0          3     2
2  1          0     1
4  2          6     1

Finding most common values with Pandas GroupBy and value_counts

Use head in each group from the results of value_counts:

df.groupby('Area Name')['Code Description'].apply(lambda x: x.value_counts().head(3))

Output:

Area Name                                                                
77th Street  RAPE, FORCIBLE                                                  1
Foothill     CRM AGNST CHLD (13 OR UNDER) (14-15 & SUSP 10 YRS OLDER)0060    1
N Hollywood  CRIMINAL THREATS - NO WEAPON DISPLAYED                          2
             VIOLATION OF RESTRAINING ORDER                                  1
             ASSAULT WITH DEADLY WEAPON, AGGRAVATED ASSAULT                  1
Southeast    ASSAULT WITH DEADLY WEAPON, AGGRAVATED ASSAULT                  1
West Valley  CRIMINAL THREATS - NO WEAPON DISPLAYED                          2
Name: Code Description, dtype: int64

How to find common values in groupby groups?

Since df is already sorted by tour, we could use groupby + first:

df['val'] = df.groupby('user')['val'].transform('first')

Output:

    user game tour  val
0    jim    1    1   10
1   john    1    1   12
2   jack    2    1   14
3    jim    2    1   10
4    mel    3    2   20
5    jim    3    2   10
6    mat    4    2   14
7   nick    4    2   20
8    tim    5    3   16
9   john    5    3   12
10   lin    6    3   16
11  mick    6    3   20

Retrieve most frequent value for each couple of values

Use custom lambda function for first mode:

df = (df.groupby(['Latitude', 'Longitude'])['street_type']
        .agg(lambda x: x.mode().iat[0])
        .reset_index())

Groupby Pandas Dataframe and Select Most Common Value

GroupBy pandas DataFrame and select most common value

GroupBy pandas DataFrame and select most common value which is alphabetically first

Group by a column to find the most frequent value in another column?

pandas groupby and find most frequent value (mode)

Python: select most frequent using group by

Pandas: return the occurrences of the most frequent value for each group (possibly without apply)

Finding most common values with Pandas GroupBy and value_counts

How to find common values in groupby groups?

Retrieve most frequent value for each couple of values

Related Topics

Leave a reply