Mode in R by Groups

Mode in R by groups

One approach:

> myfun <- function(x) unique(x)[which.max(table(x))]
> DT[ , moda := myfun(number), by = age]
> DT
   age          v number moda
1:  12 -0.9740026    122  122
2:  12  0.6893727    125  122
3:   3 -0.9558391      5    5
4:   3 -1.2317071      5    5
5:  12 -0.9568919    122  122

How to get the mode of a group in summarize in R

You need to make a couple of changes to your code for mlv to work.

the method (mfv) has to be within quotes ('mfv'). That is what is causing your error.
After you do that, since mlv returns a list, you have to feed one value to summarise(). Assuming that you want the mode ('M'), you pick that element from the list.

Try:

dataSummary <- dataObs %>%
  group_by(ParNonPar, CPTCode) %>%
  summarise(mean = mean(net_paid), 
            meadian=median(net_paid), 
            mode = mlv(net_paid, method='mfv')[['M']], 
            total = sum(net_paid))

to get:

> dataSummary
Source: local data frame [3 x 6]
Groups: ParNonPar

  ParNonPar CPTCode     mean meadian     mode   total
1         N     104 639.7111  893.00 622.7333 5757.40
2         Y     100   0.0000    0.00   0.0000    0.00
3         Y     103 740.2800  740.28 740.2800  740.28

Hope that helps you move forward.

How to find mean/median/mode based on distinctive groups in R?

You can do this best with dplyr but first you will have to write a function for the mode:

getmode <- function(v) {
  uniqv <- unique(v[!is.na(v)])
  uniqv[which.max(table(match(v, uniqv)))]
}

Now you can group_bythe grouping variable Country and use summarise to calculate the statistics:

library(dplyr)
df %>%
  group_by(Country) %>%
  summarise(Mean = mean(Happiness),
            Median = median(Happiness),
            Mode = getmode(Happiness))

Result:

# A tibble: 4 x 4
  Country  Mean Median  Mode
* <chr>   <dbl>  <dbl> <int>
1 A         2.5    2.5     2
2 B         2      2       2
3 C         3      3       3
4 D         3.5    3.5     5

Data:

set.seed(12)
df <- data.frame(
  Country = sample(LETTERS[1:4], 10, replace = T),
  Happiness = sample(1:5, 10, replace = T)
)

Most frequent value (mode) by group

Building on Davids comments your solution is the following:

Mode <- function(x) {
  ux <- unique(x)
  ux[which.max(tabulate(match(x, ux)))]
}

library(dplyr)
df %>% group_by(a) %>% mutate(c=Mode(b))

Notice though that for the tie when df$a is 3 then the mode for b is 1.

Most common value (mode) by group in R

You can do it like this:

library(dplyr)

df %>%
  count(a, b, c) %>%
  group_by(a, c) %>%
  filter(n == max(n)) %>%
  select(a, b, c)

Solution:

# A tibble: 8 x 3
# Groups:   a, c [6]
  a         b c    
  <fct> <dbl> <fct>
1 a         2 Feb  
2 a         1 Feb  
3 a         2 Jan  
4 a         3 Mar  
5 b         3 Mar  
6 b         1 Jan  
7 b         2 Feb  
8 b         3 Feb

R number of grouped observations equal to the mode (by group) over time

We could group by 'group_name', and summarise across the rest of the columns (everything()) by applying the Mode function on a subset of rows by excluding the 0 values (.[. != 0]), create a logical vector (==) with the elements of the column and get the sum to find the frequency for each column by the grouping variable

library(dplyr)
df1 %>%
    group_by(group_name) %>%
    summarise(across(everything(), ~ sum(Mode(.[. !=0]) == ., na.rm = TRUE)))
# A tibble: 3 x 5
#  group_name    t1   t10   t50  t100
#  <chr>      <int> <int> <int> <int>
#1 s1             2     2     2     2
#2 s2             0     1     2     2
#3 s3             0     2     1     2

Or using data.table

library(data.table)
setDT(df1)[, lapply(.SD, function(x) sum(Mode(x[x != 0]) == x, na.rm = TRUE)),
             by = group_name]

where

Mode <- function(x) {
  ux <- unique(x)
  ux[which.max(tabulate(match(x, ux)))]
}

If we need to calculate across the 't' columns, reshape to 'long' format (pivot_longer), filter out the 0 values, grouped by 'group_name', summarise with the frequency of 'Mode' values

library(tidyr)
df1 %>% 
  pivot_longer(cols = starts_with('t')) %>%
  filter(value != 0) %>% 
  group_by(group_name) %>% 
  summarise(n_Mode = sum(Mode(value) == value))

How to find the statistical mode?

One more solution, which works for both numeric & character/factor data:

Mode <- function(x) {
  ux <- unique(x)
  ux[which.max(tabulate(match(x, ux)))]
}

On my dinky little machine, that can generate & find the mode of a 10M-integer vector in about half a second.

If your data set might have multiple modes, the above solution takes the same approach as which.max, and returns the first-appearing value of the set of modes. To return all modes, use this variant (from @digEmAll in the comments):

Modes <- function(x) {
  ux <- unique(x)
  tab <- tabulate(match(x, ux))
  ux[tab == max(tab)]
}

Rearrange rows and calculate mode in R by creating a new variable

A dplyr approach where I join the data to a version of itself with just the most-common CODCOM value (or first appearing with ties).

library(dplyr)
df1 %>%
  left_join(
    df1 %>%
      group_by(ID) %>%
      count(mode = CODCOM, sort = TRUE) %>%
      slice(1),
    by = "ID"
  )

       ID CODCOM mode n
1   10000     12   12 1
2  101010     14   14 1
3  201020     11   11 2
4  201020     11   11 2
5  201020     12   11 2
6  324032     43   43 3
7  324032     43   43 3
8  324032     43   43 3
9  405044     51   51 1
10 323032     21   21 1

R Data.Table Mode Imputation First Record By Group

We can use the Mode function from here

Mode <- function(x) {
  ux <- unique(x)
   ux[which.max(tabulate(match(x, ux)))]
}

and then loop over the columns of interest to calculate the 'Mode' by 'group' and replace where there are NA and the 'Time' is 1

library(data.table)
nm1 <- c("Test", "Score", "P")
setDT(data)[ , (nm1) := lapply(.SD, function(x) 
    replace(x, is.na(x) & Time == 1, Mode(x))), by = .(Group), .SDcols = nm1]

For the second case, it would be

library(zoo)
nm2 <- c("Test", "Score")
data[Time  > 1,  (nm2) := lapply(.SD, na.locf0), .SDcols = nm2, by = Group]

Mode in R by Groups