GroupBy pandas DataFrame and select most common value
You can use value_counts()
to get a count series, and get the first row:
import pandas as pd
source = pd.DataFrame({'Country' : ['USA', 'USA', 'Russia','USA'],
'City' : ['New-York', 'New-York', 'Sankt-Petersburg', 'New-York'],
'Short name' : ['NY','New','Spb','NY']})
source.groupby(['Country','City']).agg(lambda x:x.value_counts().index[0])
In case you are wondering about performing other agg functions in the .agg()
try this.
# Let's add a new col, account
source['account'] = [1,2,3,3]
source.groupby(['Country','City']).agg(mod = ('Short name', \
lambda x: x.value_counts().index[0]),
avg = ('account', 'mean') \
)
GroupBy pandas DataFrame and select most common value which is alphabetically first
Try with groupby
and mode
:
mapper = df.groupby("province")["city"].agg(lambda x: x.mode().sort_values()[0]).to_dict()
df["city"] = df["city"].where(df["city"].notnull(),
df["province"].map(mapper))
>>> df
province city
0 A newyork
1 A london
2 A newyork
3 A london
4 A london
5 A london
6 A houston
7 B hyderabad
8 B karachi
9 B hyderabad
10 B hyderabad
11 B hyderabad
12 B beijing
13 B karachi
Group by a column to find the most frequent value in another column?
Use SeriesGroupBy.value_counts
and select first value of index:
df = df.groupby('col1')['col2'].apply(lambda x: x.value_counts().index[0]).reset_index()
print (df)
col1 col2
0 blue nb
1 green gx
Or add DataFrame.drop_duplicates
:
df = df.groupby('col1')['col2'].value_counts().reset_index(name='v')
df = df.drop_duplicates('col1')[['col1','col2']]
print (df)
col1 col2
0 blue nb
2 green gx
Or use Series.mode
and select first value by positions by Series.iat
:
df = df.groupby('col1')['col2'].apply(lambda x: x.mode().iat[0]).reset_index()
print (df)
col1 col2
0 blue nb
1 green gx
EDIT:
Problem is with only NaN
s groups:
d = {'col1': ['green','green','green','blue','blue','blue'],
'col2': [np.nan,np.nan,np.nan,'nb','nb','mj']}
df = pd.DataFrame(data=d)
f = lambda x: np.nan if x.isnull().all() else x.value_counts().index[0]
#or
#f = lambda x: next(iter(x.value_counts().index), np.nan)
#another solution
#f = lambda x: next(iter(x.mode()), np.nan)
df = df.groupby('col1')['col2'].apply(f).reset_index()
print (df)
col1 col2
0 blue nb
1 green NaN
pandas groupby and find most frequent value (mode)
You can calculate both count
and max
on dates, then sort on these values and drop duplicates (or use groupby().head()):
s = df.groupby(['user_id','product_id'])['created_at'].agg(['count','max'])
s.sort_values(['count','max'], ascending=False).groupby('user_id').head(1)
Output:
count max
user_id product_id
3 400 2 2021-04-21 10:20:00
1 200 2 2020-06-24 10:10:24
2 300 1 2021-01-21 10:20:00
Python: select most frequent using group by
In the comments you note you're using pandas
. You can do something like the following:
>>> df
tag category
0 automotive 8
1 ba 8
2 bamboo 8
3 bamboo 8
4 bamboo 8
5 bamboo 8
6 bamboo 8
7 bamboo 10
8 bamboo 8
9 bamboo 9
10 bamboo 8
11 bamboo 10
12 bamboo 8
13 bamboo 9
14 bamboo 8
15 banana tree 8
16 banana tree 8
17 banana tree 8
18 banana tree 8
19 bath 9
Do a groupby
on 'tag' for the 'category' column and then within each group use the mode
method. However, we have to make it a conditional because pandas
doesn't return a number for the mode
if the number of observations is less than 3 (we can just return the group itself in the special cases of 1 or 2 observations in a group). We can use the aggregate/agg
method with a lambda function to do this:
>>> mode = lambda x: x.mode() if len(x) > 2 else np.array(x)
>>> df.groupby('tag')['category'].agg(mode)
tag
automotive 8
ba 8
bamboo 8
banana tree 8
bath 9
Note, when the mode is multi-modal you will get a array (numpy). For example, suppose there were two entries for bath (all the other data is the same):
tag|category
bath|9
bath|10
In that case the output would be:
>>> mode = lambda x: x.mode() if len(x) > 2 else np.array(x)
>>> df.groupby('tag')['category'].agg(mode)
tag
automotive 8
ba 8
bamboo 8
banana tree 8
bath [9, 10]
You can also use the value_counts
method instead of mode
. Once again, do a groupby
on 'tag' for the 'category' column and then within each group use the value_counts
method. value_counts
arranges in descending order so you want to grab the index of the first row:
>>> df.groupby('tag')['category'].agg(lambda x: x.value_counts().index[0])
tag
automotive 8
ba 8
bamboo 8
banana tree 8
bath 9
However, this won't return an array in multi-modal situations. It will just return the first mode.
Pandas: return the occurrences of the most frequent value for each group (possibly without apply)
Use SeriesGroupBy.value_counts
which sorting by default, so then add DataFrame.drop_duplicates
for top values after Series.reset_index
:
df = (df_test.groupby('A')['B']
.value_counts()
.rename_axis(['A','most_freq'])
.reset_index(name='freq')
.drop_duplicates('A'))
print (df)
A most_freq freq
0 0 3 2
2 1 0 1
4 2 6 1
Finding most common values with Pandas GroupBy and value_counts
Use head
in each group from the results of value_counts
:
df.groupby('Area Name')['Code Description'].apply(lambda x: x.value_counts().head(3))
Output:
Area Name
77th Street RAPE, FORCIBLE 1
Foothill CRM AGNST CHLD (13 OR UNDER) (14-15 & SUSP 10 YRS OLDER)0060 1
N Hollywood CRIMINAL THREATS - NO WEAPON DISPLAYED 2
VIOLATION OF RESTRAINING ORDER 1
ASSAULT WITH DEADLY WEAPON, AGGRAVATED ASSAULT 1
Southeast ASSAULT WITH DEADLY WEAPON, AGGRAVATED ASSAULT 1
West Valley CRIMINAL THREATS - NO WEAPON DISPLAYED 2
Name: Code Description, dtype: int64
How to find common values in groupby groups?
Since df
is already sorted by tour
, we could use groupby
+ first
:
df['val'] = df.groupby('user')['val'].transform('first')
Output:
user game tour val
0 jim 1 1 10
1 john 1 1 12
2 jack 2 1 14
3 jim 2 1 10
4 mel 3 2 20
5 jim 3 2 10
6 mat 4 2 14
7 nick 4 2 20
8 tim 5 3 16
9 john 5 3 12
10 lin 6 3 16
11 mick 6 3 20
Retrieve most frequent value for each couple of values
Use custom lambda function for first mode
:
df = (df.groupby(['Latitude', 'Longitude'])['street_type']
.agg(lambda x: x.mode().iat[0])
.reset_index())
Related Topics
Which Command to Use for Checking Whether Python Is 64Bit or 32Bit
Subprocess.Popen(): Oserror: [Errno 8] Exec Format Error in Python
Multiprocessing Module Showing Memory for Each Child Process Same as Main Process
How to Get Apache to Serve Static Files on Flask Webapp
Can't Build Matplotlib (Png Package Issue)
Changes in Import Statement Python3
Google App Engine: Won't Serve Static Assets with Below Error:
Decrypt Chrome Linux Blob Encrypted Cookies in Python
Why Is Signal.Sigalrm Not Working in Python on Windows
Can't Install Gcloud on Amazon Linux:Invalid Syntax