Find the most frequent value by row
Something like :
apply(df,1,function(x) names(which.max(table(x))))
[1] "red" "yellow" "green"
In case there is a tie, which.max takes the first max value. From the
which.max help page :
Determines the location, i.e., index of the (first)
minimum or maximum of a numeric vector.
Ex :
var4 <- c("yellow","green","yellow")
df <- data.frame(cbind(id, var1, var2, var3, var4))
> df
id var1 var2 var3 var4
1 1 red red yellow yellow
2 2 yellow yellow orange green
3 3 green green green yellow
apply(df,1,function(x) names(which.max(table(x))))
[1] "red" "yellow" "green"
Get the most frequent value per row and account for ties
I think this achieves what you're looking for. For each row, it creates a table of frequencies of each letter and chooses the largest, whilst preserving column order for ties. It then returns the name of the first column in this table.
Thanks to Henrik for suggesting the improvement.
df$New.Group <- apply(df[-1], 1, function(x) {
names(which.max(table(factor(x, unique(x)))))
})
df
#> ID Group1 Group2 Group3 Group4 Group5 New.Group
#> 1 1 A E A <NA> A A
#> 2 2 <NA> C A C D C
#> 3 3 C C <NA> <NA> <NA> C
#> 4 4 <NA> <NA> <NA> D <NA> D
#> 5 5 E E C C <NA> E
#> 6 6 C E <NA> <NA> <NA> C
How to get the most frequent row in table
In Pandas 1.1.0. is possible to use the method value_counts()
to count unique rows in DataFrame:
df.value_counts()
Output:
col_1 col_2 col_3
1 1 A 2
0 C 1
B 1
A 1
0 1 A 1
This method can be used to find the most frequent row:
df.value_counts().head(1).index.to_frame(index=False)
Output:
col_1 col_2 col_3
0 1 1 A
for each row get frequency of the most frequent value
Another way is to apply with axis=1:
df['frq'] = df.apply(lambda x: x.value_counts().iloc[0], axis=1)
Or use stack
and groupby
:
df['frq'] = df.stack().groupby(level=0).value_counts().max(level=0)
Get most common value for each value in row - pandas df
This works:
df['Common'] = df.groupby('id')['name'].transform(lambda x: x.mode()[0])
Output:
>>> df
id name Common
0 One A A
1 One A A
2 One A A
3 One B A
4 Two C C
pandas: how to find the most frequent value of each row?
try .mode() method:
In [88]: df
Out[88]:
a b c
0 2 3 3
1 1 1 2
2 7 7 8
In [89]: df.mode(axis=1)
Out[89]:
0
0 3
1 1
2 7
From docs:
Gets the mode(s) of each element along the axis selected. Adds a row
for each mode per label, fills in gaps with nan.Note that there could be multiple values returned for the selected
axis (when more than one item share the maximum frequency), which is
the reason why a dataframe is returned. If you want to impute missing
values with the mode in a dataframe df, you can just do this:
df.fillna(df.mode().iloc[0])
Find most frequent value in SQL column
SELECT
<column_name>,
COUNT(<column_name>) AS `value_occurrence`
FROM
<my_table>
GROUP BY
<column_name>
ORDER BY
`value_occurrence` DESC
LIMIT 1;
Replace <column_name>
and <my_table>
. Increase 1
if you want to see the N
most common values of the column.
How to find the most frequent value based on dates in another column in Python
Summary of the problem:
- Inputs: Given a date
- Output: The freq of which values in column Name is greater than 50 percent
import numpy as np # If not downloaded run 'pip install numpy'
date=input('Enter your date ex:06/21')
#date='06/21'
def frequency(df,date):
dfcrop=df[df['Date']==date]#Crop the columns with given date
dfcrop=df[df['Accepted']==True]# As per condition
values=list()
for value in set(list(dfcrop['Name'])): # Take out all unique names
freq=np.sum((dfcrop['Name']==value).astype(np.int32))
freq=freq/dfcrop.shape[0]# Calculate frequency
if freq >=0.5: # frequency is greater than 50 percent
values.append(value)
return values
freq=frequency(df,date) # freq is a list with the names with a freq above 50 percent on given date
dates_to_use=[<put all dates to use>]
df=df[df.date.isin(dates_to_use)==False]
If you want to use only some selected dates do this as per what the author asked in comments
Count the most frequent values in a row pandas and make a column with that most frequent value
Use DataFrame.mode
with select first column by positions with DataFrame.iloc
:
df['result'] = df.mode(axis=1).iloc[:, 0]
print (df)
a b c result
0 3 3 3 3
1 3 3 3 3
2 3 3 3 3
3 3 3 3 3
4 2 3 2 2
5 3 3 3 3
6 1 2 1 1
7 2 3 2 2
8 0 0 0 0
9 0 1 0 0
Related Topics
Split One Row into Multiple Rows
Differencebetween Cat and Print
R Color Palettes for Many Data Classes
Producing a Vector Graphics Image (I.E. Metafile) in R Suitable for Printing in Word 2007
Any Way to Make Plot Points in Scatterplot More Transparent in R
R: How to Rbind Two Huge Data-Frames Without Running Out of Memory
How to Manipulate the Strip Text of Facet_Grid Plots
How to Nicely Annotate a Ggplot2 (Manual)
Ggplot Geom_Text Font Size Control
How to Delete the First Row of a Dataframe in R
Outputting Multiple Lines of Text with Rendertext() in R Shiny
Ggplot2: Connecting Points in Polar Coordinates with a Straight Line 2