Mode in R by Groups

Mode in R by groups

One approach:

> myfun <- function(x) unique(x)[which.max(table(x))]
> DT[ , moda := myfun(number), by = age]
> DT
age v number moda
1: 12 -0.9740026 122 122
2: 12 0.6893727 125 122
3: 3 -0.9558391 5 5
4: 3 -1.2317071 5 5
5: 12 -0.9568919 122 122

How to get the mode of a group in summarize in R

You need to make a couple of changes to your code for mlv to work.

  1. the method (mfv) has to be within quotes ('mfv'). That is what is causing your error.
  2. After you do that, since mlv returns a list, you have to feed one value to summarise(). Assuming that you want the mode ('M'), you pick that element from the list.

Try:

dataSummary <- dataObs %>%
group_by(ParNonPar, CPTCode) %>%
summarise(mean = mean(net_paid),
meadian=median(net_paid),
mode = mlv(net_paid, method='mfv')[['M']],
total = sum(net_paid))

to get:

> dataSummary
Source: local data frame [3 x 6]
Groups: ParNonPar

ParNonPar CPTCode mean meadian mode total
1 N 104 639.7111 893.00 622.7333 5757.40
2 Y 100 0.0000 0.00 0.0000 0.00
3 Y 103 740.2800 740.28 740.2800 740.28

Hope that helps you move forward.

How to find mean/median/mode based on distinctive groups in R?

You can do this best with dplyr but first you will have to write a function for the mode:

getmode <- function(v) {
uniqv <- unique(v[!is.na(v)])
uniqv[which.max(table(match(v, uniqv)))]
}

Now you can group_bythe grouping variable Country and use summarise to calculate the statistics:

library(dplyr)
df %>%
group_by(Country) %>%
summarise(Mean = mean(Happiness),
Median = median(Happiness),
Mode = getmode(Happiness))

Result:

# A tibble: 4 x 4
Country Mean Median Mode
* <chr> <dbl> <dbl> <int>
1 A 2.5 2.5 2
2 B 2 2 2
3 C 3 3 3
4 D 3.5 3.5 5

Data:

set.seed(12)
df <- data.frame(
Country = sample(LETTERS[1:4], 10, replace = T),
Happiness = sample(1:5, 10, replace = T)
)

Most frequent value (mode) by group

Building on Davids comments your solution is the following:

Mode <- function(x) {
ux <- unique(x)
ux[which.max(tabulate(match(x, ux)))]
}

library(dplyr)
df %>% group_by(a) %>% mutate(c=Mode(b))

Notice though that for the tie when df$a is 3 then the mode for b is 1.

Most common value (mode) by group in R

You can do it like this:

library(dplyr)

df %>%
count(a, b, c) %>%
group_by(a, c) %>%
filter(n == max(n)) %>%
select(a, b, c)

Solution:

# A tibble: 8 x 3
# Groups: a, c [6]
a b c
<fct> <dbl> <fct>
1 a 2 Feb
2 a 1 Feb
3 a 2 Jan
4 a 3 Mar
5 b 3 Mar
6 b 1 Jan
7 b 2 Feb
8 b 3 Feb

R number of grouped observations equal to the mode (by group) over time

We could group by 'group_name', and summarise across the rest of the columns (everything()) by applying the Mode function on a subset of rows by excluding the 0 values (.[. != 0]), create a logical vector (==) with the elements of the column and get the sum to find the frequency for each column by the grouping variable

library(dplyr)
df1 %>%
group_by(group_name) %>%
summarise(across(everything(), ~ sum(Mode(.[. !=0]) == ., na.rm = TRUE)))
# A tibble: 3 x 5
# group_name t1 t10 t50 t100
# <chr> <int> <int> <int> <int>
#1 s1 2 2 2 2
#2 s2 0 1 2 2
#3 s3 0 2 1 2

Or using data.table

library(data.table)
setDT(df1)[, lapply(.SD, function(x) sum(Mode(x[x != 0]) == x, na.rm = TRUE)),
by = group_name]

where

Mode <- function(x) {
ux <- unique(x)
ux[which.max(tabulate(match(x, ux)))]
}

If we need to calculate across the 't' columns, reshape to 'long' format (pivot_longer), filter out the 0 values, grouped by 'group_name', summarise with the frequency of 'Mode' values

library(tidyr)
df1 %>%
pivot_longer(cols = starts_with('t')) %>%
filter(value != 0) %>%
group_by(group_name) %>%
summarise(n_Mode = sum(Mode(value) == value))

How to find the statistical mode?

One more solution, which works for both numeric & character/factor data:

Mode <- function(x) {
ux <- unique(x)
ux[which.max(tabulate(match(x, ux)))]
}

On my dinky little machine, that can generate & find the mode of a 10M-integer vector in about half a second.

If your data set might have multiple modes, the above solution takes the same approach as which.max, and returns the first-appearing value of the set of modes. To return all modes, use this variant (from @digEmAll in the comments):

Modes <- function(x) {
ux <- unique(x)
tab <- tabulate(match(x, ux))
ux[tab == max(tab)]
}

Rearrange rows and calculate mode in R by creating a new variable

A dplyr approach where I join the data to a version of itself with just the most-common CODCOM value (or first appearing with ties).

library(dplyr)
df1 %>%
left_join(
df1 %>%
group_by(ID) %>%
count(mode = CODCOM, sort = TRUE) %>%
slice(1),
by = "ID"
)

ID CODCOM mode n
1 10000 12 12 1
2 101010 14 14 1
3 201020 11 11 2
4 201020 11 11 2
5 201020 12 11 2
6 324032 43 43 3
7 324032 43 43 3
8 324032 43 43 3
9 405044 51 51 1
10 323032 21 21 1

R Data.Table Mode Imputation First Record By Group

We can use the Mode function from here

Mode <- function(x) {
ux <- unique(x)
ux[which.max(tabulate(match(x, ux)))]
}

and then loop over the columns of interest to calculate the 'Mode' by 'group' and replace where there are NA and the 'Time' is 1

library(data.table)
nm1 <- c("Test", "Score", "P")
setDT(data)[ , (nm1) := lapply(.SD, function(x)
replace(x, is.na(x) & Time == 1, Mode(x))), by = .(Group), .SDcols = nm1]

For the second case, it would be

library(zoo)
nm2 <- c("Test", "Score")
data[Time > 1, (nm2) := lapply(.SD, na.locf0), .SDcols = nm2, by = Group]


Related Topics



Leave a reply



Submit