How to Retrieve the Most Repeated Value in a Column Present in a Data Frame

How to retrieve the most repeated value in a column present in a data frame

tail(names(sort(table(Forbes2000$category))), 1)

Pandas get the most frequent values of a column

By using mode

df.name.mode()
Out[712]: 
0     alex
1    helen
dtype: object

How to get the number of the most frequent value in a column?

It looks like you may have some nulls in the column. You can drop them with df = df.dropna(subset=['item']). Then df['item'].value_counts().max() should give you the max counts, and df['item'].value_counts().idxmax() should give you the most frequent value.

Retrieve the most repeated (x, y) values in two columns in a data frame

(Despite all the plus votes, a hybrid of @DavidArenburg and my approaches

res = do.call("paste", c(xy, sep="\r"))
which.max(tabulate(match(res, res)))

might be simple and effective.)

Maybe it seems a little round-about, but a first step is to transform the possibly arbitrary values in the columns of xy to integers ranging from 1 to the number of unique values in the column

x = match(xy[[1]], unique(xy[[1]]))
y = match(xy[[2]], unique(xy[[2]]))

Then encode the combination of columns to unique values

v = x + (max(x) - 1L) * y

Indexing minimizes the range of values under consideration, and encoding reduces a two-dimensional problem to a single dimension. These steps reduce the space required of any tabulation (as with table() in other answers) to the minimum, without creating character vectors.

If one wanted to most common occurrence in a single dimension, then one could index and tabulate v

tbl = tabulate(match(v, v))

and find the index of the first occurrence of the maximum value(s), e.g.,

df[which.max(tbl),]

Here's a function to do the magic

whichpairmax <- function(x, y) {
    x = match(x, unique(x)); y = match(y, unique(y))
    v = x + (max(x) - 1L) * y
    which.max(tabulate(match(v, v)))
}

and a couple of tests

> set.seed(123)
> xy[whichpairmax(xy[[1]], xy[[2]]),]
  x y
1 1 1
> xy1 = xy[sample(nrow(xy)),]
> xy1[whichpairmax(xy1[[1]], xy1[[2]]),]
  x y
1 1 1
> xy1
  x  y
3 2  5
5 4  9
7 6 12
4 3  6
6 5 10
1 1  1
2 1  1

For an arbitrary data.frame

whichdfmax <- function(df) {
    v = integer(nrow(df))
    for (col in df) {
        col = match(col, unique(col))
        v = col + (max(col) - 1L) * match(v, unique(v))
    }
    which.max(tabulate(match(v, v)))
}

Find most frequent value in SQL column

SELECT
  <column_name>,
  COUNT(<column_name>) AS `value_occurrence` 

FROM
  <my_table>

GROUP BY 
  <column_name>

ORDER BY 
  `value_occurrence` DESC

LIMIT 1;

Replace <column_name> and <my_table>. Increase 1 if you want to see the N most common values of the column.

How to find the most frequent value based on dates in another column in Python

Summary of the problem:

Inputs: Given a date
Output: The freq of which values in column Name is greater than 50 percent

import numpy as np # If not downloaded run 'pip install numpy'
date=input('Enter your date ex:06/21')
#date='06/21'
def frequency(df,date):
    dfcrop=df[df['Date']==date]#Crop the columns with given date
    dfcrop=df[df['Accepted']==True]# As per condition
    values=list()
    for value in set(list(dfcrop['Name'])): # Take out all unique names
        freq=np.sum((dfcrop['Name']==value).astype(np.int32))
        freq=freq/dfcrop.shape[0]# Calculate frequency
        if freq >=0.5: # frequency is greater than 50 percent
           values.append(value)
    return values
freq=frequency(df,date) # freq is a list with the names with a freq above 50 percent on given date

dates_to_use=[<put all dates to use>]

df=df[df.date.isin(dates_to_use)==False]

If you want to use only some selected dates do this as per what the author asked in comments

How to get the n most frequent or top values per column in python pandas?

This should work

n = 2
df.apply(lambda x: pd.Series(x.value_counts().index[:n]))

pandas rolling get most frequent value in window

Use rolling().apply with mode method:

df.rolling(window = 3).apply(lambda x: x.mode()[0])

x will be a pd.Series object of length equal to window. mode method will return a list of most frequent values. lambda will return the first item from the list of modes.

How to Retrieve the Most Repeated Value in a Column Present in a Data Frame