Find the Most Frequent Value by Row

Find the most frequent value by row

Something like :

apply(df,1,function(x) names(which.max(table(x))))
[1] "red" "yellow" "green"

In case there is a tie, which.max takes the first max value. From the
which.max help page :

Determines the location, i.e., index of the (first)
minimum or maximum of a numeric vector.

Ex :

var4 <- c("yellow","green","yellow")
df <- data.frame(cbind(id, var1, var2, var3, var4))

> df
id var1 var2 var3 var4
1 1 red red yellow yellow
2 2 yellow yellow orange green
3 3 green green green yellow

apply(df,1,function(x) names(which.max(table(x))))
[1] "red" "yellow" "green"

Get the most frequent value per row and account for ties

I think this achieves what you're looking for. For each row, it creates a table of frequencies of each letter and chooses the largest, whilst preserving column order for ties. It then returns the name of the first column in this table.

Thanks to Henrik for suggesting the improvement.

df$New.Group <- apply(df[-1], 1, function(x) {
names(which.max(table(factor(x, unique(x)))))
})

df
#> ID Group1 Group2 Group3 Group4 Group5 New.Group
#> 1 1 A E A <NA> A A
#> 2 2 <NA> C A C D C
#> 3 3 C C <NA> <NA> <NA> C
#> 4 4 <NA> <NA> <NA> D <NA> D
#> 5 5 E E C C <NA> E
#> 6 6 C E <NA> <NA> <NA> C

How to get the most frequent row in table

In Pandas 1.1.0. is possible to use the method value_counts() to count unique rows in DataFrame:

df.value_counts()

Output:

col_1  col_2  col_3
1 1 A 2
0 C 1
B 1
A 1
0 1 A 1

This method can be used to find the most frequent row:

df.value_counts().head(1).index.to_frame(index=False)

Output:

   col_1  col_2 col_3
0 1 1 A

for each row get frequency of the most frequent value

Another way is to apply with axis=1:

df['frq'] = df.apply(lambda x: x.value_counts().iloc[0], axis=1)

Or use stack and groupby:

df['frq'] = df.stack().groupby(level=0).value_counts().max(level=0)

Get most common value for each value in row - pandas df

This works:

df['Common'] = df.groupby('id')['name'].transform(lambda x: x.mode()[0])

Output:

>>> df
id name Common
0 One A A
1 One A A
2 One A A
3 One B A
4 Two C C

pandas: how to find the most frequent value of each row?

try .mode() method:

In [88]: df
Out[88]:
a b c
0 2 3 3
1 1 1 2
2 7 7 8

In [89]: df.mode(axis=1)
Out[89]:
0
0 3
1 1
2 7

From docs:

Gets the mode(s) of each element along the axis selected. Adds a row
for each mode per label, fills in gaps with nan.

Note that there could be multiple values returned for the selected
axis (when more than one item share the maximum frequency), which is
the reason why a dataframe is returned. If you want to impute missing
values with the mode in a dataframe df, you can just do this:
df.fillna(df.mode().iloc[0])

Find most frequent value in SQL column


SELECT
<column_name>,
COUNT(<column_name>) AS `value_occurrence`

FROM
<my_table>

GROUP BY
<column_name>

ORDER BY
`value_occurrence` DESC

LIMIT 1;

Replace <column_name> and <my_table>. Increase 1 if you want to see the N most common values of the column.

How to find the most frequent value based on dates in another column in Python

Summary of the problem:

  • Inputs: Given a date
  • Output: The freq of which values in column Name is greater than 50 percent
import numpy as np # If not downloaded run 'pip install numpy'
date=input('Enter your date ex:06/21')
#date='06/21'
def frequency(df,date):
dfcrop=df[df['Date']==date]#Crop the columns with given date
dfcrop=df[df['Accepted']==True]# As per condition
values=list()
for value in set(list(dfcrop['Name'])): # Take out all unique names
freq=np.sum((dfcrop['Name']==value).astype(np.int32))
freq=freq/dfcrop.shape[0]# Calculate frequency
if freq >=0.5: # frequency is greater than 50 percent
values.append(value)
return values
freq=frequency(df,date) # freq is a list with the names with a freq above 50 percent on given date
dates_to_use=[<put all dates to use>]

df=df[df.date.isin(dates_to_use)==False]

If you want to use only some selected dates do this as per what the author asked in comments

Count the most frequent values in a row pandas and make a column with that most frequent value

Use DataFrame.mode with select first column by positions with DataFrame.iloc:

df['result'] = df.mode(axis=1).iloc[:, 0]
print (df)
a b c result
0 3 3 3 3
1 3 3 3 3
2 3 3 3 3
3 3 3 3 3
4 2 3 2 2
5 3 3 3 3
6 1 2 1 1
7 2 3 2 2
8 0 0 0 0
9 0 1 0 0


Related Topics



Leave a reply



Submit