Mode in R by groups
One approach:
> myfun <- function(x) unique(x)[which.max(table(x))]
> DT[ , moda := myfun(number), by = age]
> DT
age v number moda
1: 12 -0.9740026 122 122
2: 12 0.6893727 125 122
3: 3 -0.9558391 5 5
4: 3 -1.2317071 5 5
5: 12 -0.9568919 122 122
How to get the mode of a group in summarize in R
You need to make a couple of changes to your code for mlv to work.
- the method (mfv) has to be within quotes ('mfv'). That is what is causing your error.
- After you do that, since mlv returns a list, you have to feed one value to summarise(). Assuming that you want the mode ('M'), you pick that element from the list.
Try:
dataSummary <- dataObs %>%
group_by(ParNonPar, CPTCode) %>%
summarise(mean = mean(net_paid),
meadian=median(net_paid),
mode = mlv(net_paid, method='mfv')[['M']],
total = sum(net_paid))
to get:
> dataSummary
Source: local data frame [3 x 6]
Groups: ParNonPar
ParNonPar CPTCode mean meadian mode total
1 N 104 639.7111 893.00 622.7333 5757.40
2 Y 100 0.0000 0.00 0.0000 0.00
3 Y 103 740.2800 740.28 740.2800 740.28
Hope that helps you move forward.
How to find mean/median/mode based on distinctive groups in R?
You can do this best with dplyr
but first you will have to write a function for the mode:
getmode <- function(v) {
uniqv <- unique(v[!is.na(v)])
uniqv[which.max(table(match(v, uniqv)))]
}
Now you can group_by
the grouping variable Country
and use summarise
to calculate the statistics:
library(dplyr)
df %>%
group_by(Country) %>%
summarise(Mean = mean(Happiness),
Median = median(Happiness),
Mode = getmode(Happiness))
Result:
# A tibble: 4 x 4
Country Mean Median Mode
* <chr> <dbl> <dbl> <int>
1 A 2.5 2.5 2
2 B 2 2 2
3 C 3 3 3
4 D 3.5 3.5 5
Data:
set.seed(12)
df <- data.frame(
Country = sample(LETTERS[1:4], 10, replace = T),
Happiness = sample(1:5, 10, replace = T)
)
Most frequent value (mode) by group
Building on Davids comments your solution is the following:
Mode <- function(x) {
ux <- unique(x)
ux[which.max(tabulate(match(x, ux)))]
}
library(dplyr)
df %>% group_by(a) %>% mutate(c=Mode(b))
Notice though that for the tie when df$a
is 3
then the mode for b
is 1
.
Most common value (mode) by group in R
You can do it like this:
library(dplyr)
df %>%
count(a, b, c) %>%
group_by(a, c) %>%
filter(n == max(n)) %>%
select(a, b, c)
Solution:
# A tibble: 8 x 3
# Groups: a, c [6]
a b c
<fct> <dbl> <fct>
1 a 2 Feb
2 a 1 Feb
3 a 2 Jan
4 a 3 Mar
5 b 3 Mar
6 b 1 Jan
7 b 2 Feb
8 b 3 Feb
R number of grouped observations equal to the mode (by group) over time
We could group by 'group_name', and summarise
across
the rest of the columns (everything()
) by applying the Mode
function on a subset of rows by excluding the 0 values (.[. != 0]
), create a logical vector (==
) with the elements of the column and get the sum
to find the frequency for each column by the grouping variable
library(dplyr)
df1 %>%
group_by(group_name) %>%
summarise(across(everything(), ~ sum(Mode(.[. !=0]) == ., na.rm = TRUE)))
# A tibble: 3 x 5
# group_name t1 t10 t50 t100
# <chr> <int> <int> <int> <int>
#1 s1 2 2 2 2
#2 s2 0 1 2 2
#3 s3 0 2 1 2
Or using data.table
library(data.table)
setDT(df1)[, lapply(.SD, function(x) sum(Mode(x[x != 0]) == x, na.rm = TRUE)),
by = group_name]
where
Mode <- function(x) {
ux <- unique(x)
ux[which.max(tabulate(match(x, ux)))]
}
If we need to calculate across the 't' columns, reshape to 'long' format (pivot_longer
), filter
out the 0 values, grouped by 'group_name', summarise
with the frequency of 'Mode' values
library(tidyr)
df1 %>%
pivot_longer(cols = starts_with('t')) %>%
filter(value != 0) %>%
group_by(group_name) %>%
summarise(n_Mode = sum(Mode(value) == value))
How to find the statistical mode?
One more solution, which works for both numeric & character/factor data:
Mode <- function(x) {
ux <- unique(x)
ux[which.max(tabulate(match(x, ux)))]
}
On my dinky little machine, that can generate & find the mode of a 10M-integer vector in about half a second.
If your data set might have multiple modes, the above solution takes the same approach as which.max
, and returns the first-appearing value of the set of modes. To return all modes, use this variant (from @digEmAll in the comments):
Modes <- function(x) {
ux <- unique(x)
tab <- tabulate(match(x, ux))
ux[tab == max(tab)]
}
Rearrange rows and calculate mode in R by creating a new variable
A dplyr approach where I join the data to a version of itself with just the most-common CODCOM value (or first appearing with ties).
library(dplyr)
df1 %>%
left_join(
df1 %>%
group_by(ID) %>%
count(mode = CODCOM, sort = TRUE) %>%
slice(1),
by = "ID"
)
ID CODCOM mode n
1 10000 12 12 1
2 101010 14 14 1
3 201020 11 11 2
4 201020 11 11 2
5 201020 12 11 2
6 324032 43 43 3
7 324032 43 43 3
8 324032 43 43 3
9 405044 51 51 1
10 323032 21 21 1
R Data.Table Mode Imputation First Record By Group
We can use the Mode
function from here
Mode <- function(x) {
ux <- unique(x)
ux[which.max(tabulate(match(x, ux)))]
}
and then loop over the columns of interest to calculate the 'Mode' by 'group' and replace
where there are NA
and the 'Time' is 1
library(data.table)
nm1 <- c("Test", "Score", "P")
setDT(data)[ , (nm1) := lapply(.SD, function(x)
replace(x, is.na(x) & Time == 1, Mode(x))), by = .(Group), .SDcols = nm1]
For the second case, it would be
library(zoo)
nm2 <- c("Test", "Score")
data[Time > 1, (nm2) := lapply(.SD, na.locf0), .SDcols = nm2, by = Group]
Related Topics
Subset Rows According to a Range of Time
Merge Dataframes on Matching A, B and *Closest* C
How to Get the Nth Element of Each Item of a List, Which Is Itself a Vector of Unknown Length
Rounding Time to Nearest Quarter Hour
How to Sort a Character Vector According to a Specific Order
Finding Non-Numeric Data in a Data Frame or Vector
Dplyr Group by Colnames Described as Vector of Strings
Expression and New Line in Plot Labels
How to Use Loess Method in Ggally::Ggpairs Using Wrap Function
Grouping Every N Minutes with Dplyr
Si Prefixes in Ggplot2 Axis Labels
How to Convert Certain Columns Only to Numeric
Generate Ggplot2 Boxplot with Different Colours for Multiple Groups
How to Declare a Thousand Separator in Read.Csv
How to Expand Axis Asymmetrically with Ggplot2 Without Setting Limits Manually
How to Use Ggplot2's Geom_Dotplot() with Both Fill and Group