R: How to Find the Mode of a Vector

How to find the statistical mode?

One more solution, which works for both numeric & character/factor data:

Mode <- function(x) {
ux <- unique(x)
ux[which.max(tabulate(match(x, ux)))]
}

On my dinky little machine, that can generate & find the mode of a 10M-integer vector in about half a second.

If your data set might have multiple modes, the above solution takes the same approach as which.max, and returns the first-appearing value of the set of modes. To return all modes, use this variant (from @digEmAll in the comments):

Modes <- function(x) {
ux <- unique(x)
tab <- tabulate(match(x, ux))
ux[tab == max(tab)]
}

R: how to find the mode of a vector

This post provides an elegant function to determine the mode so all you need to do is apply it to your data frame.

Mode <- function(x) {
ux <- unique(x)
ux[which.max(tabulate(match(x, ux)))]
}

apply(d, 2, Mode)

Yields:

MEMORY1 MEMORY2 MEMORY3 MEMORY4 MEMORY5 MEMORY6 MEMORY7 MEMORY8 
5.5 5.5 4.5 1.5 4.5 5.5 4.5 5.5

Finding the statistical mode of a vector: When having more than single mode — return the last mode

The only option you have with collapse is sorting the data beforehand e.g.

library(collapse)
my_vec <- c(1, 1, 3, 4, 5, 5, -6, -6, 2, 2)
data.frame(v = my_vec, g = gl(2, 5)) %>%
roworder(g) %>%
tfm(t = data.table::rowid(g)) %>%
roworder(g, -t) %>%
gby(g) %>%
smr(last = fmode(v, ties = "first"))

The reason revdoesn't work is because collapse grouping doesn't split the data, but only determines to which group a row belongs, and then computes statistics on all groups simultaneously using running algorithms in C++ (e.g. the grouped computation is done by fmode itself). So in your code rev is actually executed before the grouping and reverses the entire vector. In this case, probably a native data.table implementation calling fmode.defaultdirectly (to optimize on method dispatch) would be the fastest solution. I can think about adding a "last" mode if I find time for that.

How to find mode across variables/vectors within a data row in R

The modeest package provides implements a number of estimators of the mode for unimodal univariate data.

This has a function mfv to return the most frequent value, or (as ?mfv states) it is perhaps better to use `mlv(..., method = 'discrete')

library(modeest)


## assuming your data is in the data.frame dd

apply(dd[,2:6], 1,mfv)
[1] 5 7 4 2
## or
apply(dd[,2:6], 1,mlv, method = 'discrete')
[[1]]
Mode (most frequent value): 5
Bickel's modal skewness: -0.2
Call: mlv.integer(x = newX[, i], method = "discrete")

[[2]]
Mode (most frequent value): 7
Bickel's modal skewness: -0.4
Call: mlv.integer(x = newX[, i], method = "discrete")

[[3]]
Mode (most frequent value): 4
Bickel's modal skewness: -0.4
Call: mlv.integer(x = newX[, i], method = "discrete")

[[4]]
Mode (most frequent value): 2
Bickel's modal skewness: 0.4
Call: mlv.integer(x = newX[, i], method = "discrete")

Now, if you have ties for the most frequent, then you need to think about what you want.

both mfv and mlv.integer will return all the values that tie for the most frequent. (although the print method only shows a single value)

Calculating the mode or 2nd/3rd/4th most common value

Maybe you could try

f <- function (x) with(rle(sort(x)), values[order(lengths, decreasing = TRUE)])

This gives unique vector values sorted by decreasing frequency. The first will be the mode, the 2nd will be 2nd most common, etc.

Another method is to based on table():

g <- function (x) as.numeric(names(sort(table(x), decreasing = TRUE)))

But this is not recommended, as input vector x will be coerced to factor first. If you have a large vector, this is very slow. Also on exit, we have to extract character names and of the table and coerce it to numeric.


Example

set.seed(0); x <- rpois(100, 10)
f(x)
# [1] 11 12 7 9 8 13 10 14 5 15 6 2 3 16

Let's compare with the contingency table from table:

tab <- sort(table(x), decreasing = TRUE)
# 11 12 7 9 8 13 10 14 5 15 6 2 3 16
# 14 14 11 11 10 10 9 7 5 4 2 1 1 1

as.numeric(names(tab))
# [1] 11 12 7 9 8 13 10 14 5 15 6 2 3 16

So the results are the same.

Rearrange rows and calculate mode in R by creating a new variable

A dplyr approach where I join the data to a version of itself with just the most-common CODCOM value (or first appearing with ties).

library(dplyr)
df1 %>%
left_join(
df1 %>%
group_by(ID) %>%
count(mode = CODCOM, sort = TRUE) %>%
slice(1),
by = "ID"
)


ID CODCOM mode n
1 10000 12 12 1
2 101010 14 14 1
3 201020 11 11 2
4 201020 11 11 2
5 201020 12 11 2
6 324032 43 43 3
7 324032 43 43 3
8 324032 43 43 3
9 405044 51 51 1
10 323032 21 21 1

Get the mode and its frequency in a factor variable when there is a tie

You can make some small modifications to the code here to get a Mode function. Then Map over your data frame and rbind the results together

options(stringsAsFactors = F)
set.seed(2)

df.in <-
data.frame(
a = sample(letters[1:3], 10, T),
b = sample(1:3, 10, T),
c = rep(1:2, 5))

Mode <- function(x) {
ux <- unique(x)
tab <- tabulate(match(x, ux))
ind <- which(tab == max(tab))
data.frame(char = ux[ind], freq = tab[ind])
}

do.call(rbind, lapply(df.in, Mode))
# char freq
# a c 4
# b 1 4
# c.1 1 5
# c.2 2 5

How to get the mode of a group in summarize in R

You need to make a couple of changes to your code for mlv to work.

  1. the method (mfv) has to be within quotes ('mfv'). That is what is causing your error.
  2. After you do that, since mlv returns a list, you have to feed one value to summarise(). Assuming that you want the mode ('M'), you pick that element from the list.

Try:

dataSummary <- dataObs %>%
group_by(ParNonPar, CPTCode) %>%
summarise(mean = mean(net_paid),
meadian=median(net_paid),
mode = mlv(net_paid, method='mfv')[['M']],
total = sum(net_paid))

to get:

> dataSummary
Source: local data frame [3 x 6]
Groups: ParNonPar

ParNonPar CPTCode mean meadian mode total
1 N 104 639.7111 893.00 622.7333 5757.40
2 Y 100 0.0000 0.00 0.0000 0.00
3 Y 103 740.2800 740.28 740.2800 740.28

Hope that helps you move forward.



Related Topics



Leave a reply



Submit