How to find the statistical mode?
One more solution, which works for both numeric & character/factor data:
Mode <- function(x) {
ux <- unique(x)
ux[which.max(tabulate(match(x, ux)))]
}
On my dinky little machine, that can generate & find the mode of a 10M-integer vector in about half a second.
If your data set might have multiple modes, the above solution takes the same approach as which.max
, and returns the first-appearing value of the set of modes. To return all modes, use this variant (from @digEmAll in the comments):
Modes <- function(x) {
ux <- unique(x)
tab <- tabulate(match(x, ux))
ux[tab == max(tab)]
}
R: how to find the mode of a vector
This post provides an elegant function to determine the mode so all you need to do is apply it to your data frame.
Mode <- function(x) {
ux <- unique(x)
ux[which.max(tabulate(match(x, ux)))]
}
apply(d, 2, Mode)
Yields:
MEMORY1 MEMORY2 MEMORY3 MEMORY4 MEMORY5 MEMORY6 MEMORY7 MEMORY8
5.5 5.5 4.5 1.5 4.5 5.5 4.5 5.5
Finding the statistical mode of a vector: When having more than single mode — return the last mode
The only option you have with collapse is sorting the data beforehand e.g.
library(collapse)
my_vec <- c(1, 1, 3, 4, 5, 5, -6, -6, 2, 2)
data.frame(v = my_vec, g = gl(2, 5)) %>%
roworder(g) %>%
tfm(t = data.table::rowid(g)) %>%
roworder(g, -t) %>%
gby(g) %>%
smr(last = fmode(v, ties = "first"))
The reason rev
doesn't work is because collapse grouping doesn't split the data, but only determines to which group a row belongs, and then computes statistics on all groups simultaneously using running algorithms in C++ (e.g. the grouped computation is done by fmode
itself). So in your code rev
is actually executed before the grouping and reverses the entire vector. In this case, probably a native data.table implementation calling fmode.default
directly (to optimize on method dispatch) would be the fastest solution. I can think about adding a "last"
mode if I find time for that.
How to find mode across variables/vectors within a data row in R
The modeest
package provides implements a number of estimators of the mode for unimodal univariate data.
This has a function mfv
to return the most frequent value, or (as ?mfv
states) it is perhaps better to use `mlv(..., method = 'discrete')
library(modeest)
## assuming your data is in the data.frame dd
apply(dd[,2:6], 1,mfv)
[1] 5 7 4 2
## or
apply(dd[,2:6], 1,mlv, method = 'discrete')
[[1]]
Mode (most frequent value): 5
Bickel's modal skewness: -0.2
Call: mlv.integer(x = newX[, i], method = "discrete")
[[2]]
Mode (most frequent value): 7
Bickel's modal skewness: -0.4
Call: mlv.integer(x = newX[, i], method = "discrete")
[[3]]
Mode (most frequent value): 4
Bickel's modal skewness: -0.4
Call: mlv.integer(x = newX[, i], method = "discrete")
[[4]]
Mode (most frequent value): 2
Bickel's modal skewness: 0.4
Call: mlv.integer(x = newX[, i], method = "discrete")
Now, if you have ties for the most frequent, then you need to think about what you want.
both mfv
and mlv.integer
will return all the values that tie for the most frequent. (although the print method only shows a single value)
Calculating the mode or 2nd/3rd/4th most common value
Maybe you could try
f <- function (x) with(rle(sort(x)), values[order(lengths, decreasing = TRUE)])
This gives unique vector values sorted by decreasing frequency. The first will be the mode, the 2nd will be 2nd most common, etc.
Another method is to based on table()
:
g <- function (x) as.numeric(names(sort(table(x), decreasing = TRUE)))
But this is not recommended, as input vector x
will be coerced to factor first. If you have a large vector, this is very slow. Also on exit, we have to extract character names and of the table and coerce it to numeric.
Example
set.seed(0); x <- rpois(100, 10)
f(x)
# [1] 11 12 7 9 8 13 10 14 5 15 6 2 3 16
Let's compare with the contingency table from table
:
tab <- sort(table(x), decreasing = TRUE)
# 11 12 7 9 8 13 10 14 5 15 6 2 3 16
# 14 14 11 11 10 10 9 7 5 4 2 1 1 1
as.numeric(names(tab))
# [1] 11 12 7 9 8 13 10 14 5 15 6 2 3 16
So the results are the same.
Rearrange rows and calculate mode in R by creating a new variable
A dplyr approach where I join the data to a version of itself with just the most-common CODCOM value (or first appearing with ties).
library(dplyr)
df1 %>%
left_join(
df1 %>%
group_by(ID) %>%
count(mode = CODCOM, sort = TRUE) %>%
slice(1),
by = "ID"
)
ID CODCOM mode n
1 10000 12 12 1
2 101010 14 14 1
3 201020 11 11 2
4 201020 11 11 2
5 201020 12 11 2
6 324032 43 43 3
7 324032 43 43 3
8 324032 43 43 3
9 405044 51 51 1
10 323032 21 21 1
Get the mode and its frequency in a factor variable when there is a tie
You can make some small modifications to the code here to get a Mode
function. Then Map
over your data frame and rbind
the results together
options(stringsAsFactors = F)
set.seed(2)
df.in <-
data.frame(
a = sample(letters[1:3], 10, T),
b = sample(1:3, 10, T),
c = rep(1:2, 5))
Mode <- function(x) {
ux <- unique(x)
tab <- tabulate(match(x, ux))
ind <- which(tab == max(tab))
data.frame(char = ux[ind], freq = tab[ind])
}
do.call(rbind, lapply(df.in, Mode))
# char freq
# a c 4
# b 1 4
# c.1 1 5
# c.2 2 5
How to get the mode of a group in summarize in R
You need to make a couple of changes to your code for mlv to work.
- the method (mfv) has to be within quotes ('mfv'). That is what is causing your error.
- After you do that, since mlv returns a list, you have to feed one value to summarise(). Assuming that you want the mode ('M'), you pick that element from the list.
Try:
dataSummary <- dataObs %>%
group_by(ParNonPar, CPTCode) %>%
summarise(mean = mean(net_paid),
meadian=median(net_paid),
mode = mlv(net_paid, method='mfv')[['M']],
total = sum(net_paid))
to get:
> dataSummary
Source: local data frame [3 x 6]
Groups: ParNonPar
ParNonPar CPTCode mean meadian mode total
1 N 104 639.7111 893.00 622.7333 5757.40
2 Y 100 0.0000 0.00 0.0000 0.00
3 Y 103 740.2800 740.28 740.2800 740.28
Hope that helps you move forward.
Related Topics
How to Pivot/Unpivot (Cast/Melt) Data Frame
R Ggplot2: Labelling a Horizontal Line on the Y Axis with a Numeric Value
Split/Subset a Data Frame by Factors in One Column
Function to Split a Matrix into Sub-Matrices in R
How to Do Range Grouping on a Column Using Dplyr
How to Make Variable Bar Widths in Ggplot2 Not Overlap or Gap
Plot.New Has Not Been Called Yet
How to Return Number of Decimal Places in R
One-Hot Encoding in [R] | Categorical to Dummy Variables
Use Ggpairs to Create This Plot
Improve Centering County Names Ggplot & Maps
Generate Correlated Random Numbers from Binomial Distributions
How to Add a Factor Column to Dataframe Based on a Conditional Statement from Another Column
R - When Trying to Install Package: Internetopenurl Failed
Apply a Ggplot-Function Per Group with Dplyr and Set Title Per Group