How to retrieve the most repeated value in a column present in a data frame
tail(names(sort(table(Forbes2000$category))), 1)
Pandas get the most frequent values of a column
By using mode
df.name.mode()
Out[712]:
0 alex
1 helen
dtype: object
How to get the number of the most frequent value in a column?
It looks like you may have some nulls in the column. You can drop them with df = df.dropna(subset=['item'])
. Then df['item'].value_counts().max()
should give you the max counts, and df['item'].value_counts().idxmax()
should give you the most frequent value.
Retrieve the most repeated (x, y) values in two columns in a data frame
(Despite all the plus votes, a hybrid of @DavidArenburg and my approaches
res = do.call("paste", c(xy, sep="\r"))
which.max(tabulate(match(res, res)))
might be simple and effective.)
Maybe it seems a little round-about, but a first step is to transform the possibly arbitrary values in the columns of xy
to integers ranging from 1 to the number of unique values in the column
x = match(xy[[1]], unique(xy[[1]]))
y = match(xy[[2]], unique(xy[[2]]))
Then encode the combination of columns to unique values
v = x + (max(x) - 1L) * y
Indexing minimizes the range of values under consideration, and encoding reduces a two-dimensional problem to a single dimension. These steps reduce the space required of any tabulation (as with table()
in other answers) to the minimum, without creating character vectors.
If one wanted to most common occurrence in a single dimension, then one could index and tabulate v
tbl = tabulate(match(v, v))
and find the index of the first occurrence of the maximum value(s), e.g.,
df[which.max(tbl),]
Here's a function to do the magic
whichpairmax <- function(x, y) {
x = match(x, unique(x)); y = match(y, unique(y))
v = x + (max(x) - 1L) * y
which.max(tabulate(match(v, v)))
}
and a couple of tests
> set.seed(123)
> xy[whichpairmax(xy[[1]], xy[[2]]),]
x y
1 1 1
> xy1 = xy[sample(nrow(xy)),]
> xy1[whichpairmax(xy1[[1]], xy1[[2]]),]
x y
1 1 1
> xy1
x y
3 2 5
5 4 9
7 6 12
4 3 6
6 5 10
1 1 1
2 1 1
For an arbitrary data.frame
whichdfmax <- function(df) {
v = integer(nrow(df))
for (col in df) {
col = match(col, unique(col))
v = col + (max(col) - 1L) * match(v, unique(v))
}
which.max(tabulate(match(v, v)))
}
Find most frequent value in SQL column
SELECT
<column_name>,
COUNT(<column_name>) AS `value_occurrence`
FROM
<my_table>
GROUP BY
<column_name>
ORDER BY
`value_occurrence` DESC
LIMIT 1;
Replace <column_name>
and <my_table>
. Increase 1
if you want to see the N
most common values of the column.
How to find the most frequent value based on dates in another column in Python
Summary of the problem:
- Inputs: Given a date
- Output: The freq of which values in column Name is greater than 50 percent
import numpy as np # If not downloaded run 'pip install numpy'
date=input('Enter your date ex:06/21')
#date='06/21'
def frequency(df,date):
dfcrop=df[df['Date']==date]#Crop the columns with given date
dfcrop=df[df['Accepted']==True]# As per condition
values=list()
for value in set(list(dfcrop['Name'])): # Take out all unique names
freq=np.sum((dfcrop['Name']==value).astype(np.int32))
freq=freq/dfcrop.shape[0]# Calculate frequency
if freq >=0.5: # frequency is greater than 50 percent
values.append(value)
return values
freq=frequency(df,date) # freq is a list with the names with a freq above 50 percent on given date
dates_to_use=[<put all dates to use>]
df=df[df.date.isin(dates_to_use)==False]
If you want to use only some selected dates do this as per what the author asked in comments
How to get the n most frequent or top values per column in python pandas?
This should work
n = 2
df.apply(lambda x: pd.Series(x.value_counts().index[:n]))
pandas rolling get most frequent value in window
Use rolling().apply
with mode method:
df.rolling(window = 3).apply(lambda x: x.mode()[0])
x
will be a pd.Series
object of length equal to window. mode
method will return a list of most frequent values. lambda
will return the first item from the list of modes.
Related Topics
Detecting Cycle Maxima (Peaks) in Noisy Time Series (In R)
Error Installing Packages from Github
How to Fill in the Contour Fully Using Stat_Contour
Stacked Histograms Like in Flow Cytometry
Provide Shades Between Dates on X Axis
Using 'Fread' to Import CSV File from an Archive into 'R' Without Extracting to Disk
How to Pass Data Between Functions in a Shiny App
Labelling Logarithmic Scale Display in R
R Name Colnames and Rownames in List of Data.Frames with Lapply
Include Zero Frequencies in Frequency Table for Likert Data
How to Escape Characters in Variable Names
How to Determine the Geom Type of Each Layer of a Ggplot2 Object
Create Polygon from Set of Points Distributed
Dealing with Readlines() Function in R
R Memory Management Advice (Caret, Model Matrices, Data Frames)