How to find the statistical mode?
One more solution, which works for both numeric & character/factor data:
Mode <- function(x) {
ux <- unique(x)
ux[which.max(tabulate(match(x, ux)))]
}
On my dinky little machine, that can generate & find the mode of a 10M-integer vector in about half a second.
If your data set might have multiple modes, the above solution takes the same approach as which.max
, and returns the first-appearing value of the set of modes. To return all modes, use this variant (from @digEmAll in the comments):
Modes <- function(x) {
ux <- unique(x)
tab <- tabulate(match(x, ux))
ux[tab == max(tab)]
}
Find the statistical mode(s) of a dataset in PowerShell
Use a combination of Group-Object
, Sort-Object
, and a do ... while
loop:
# Sample dataset.
$dataset = 1, 2, 2, 3, 4, 4, 5
# Group the same numbers and sort the groups by member count, highest counts first.
$groups = $dataset | Group-Object | Sort-Object Count -Descending
# Output only the numbers represented by those groups that have
# the highest member count.
$i = 0
do { $groups[$i].Group[0] } while ($groups[++$i].Count -eq $groups[0].Count)
The above yields 2
and 4
, which are the two modes (values occurring most frequently, twice each in this case), sorted in ascending order (because Group-Object
sorts by the grouping criterion and Sort-Object
's sorting algorithm is stable).
Note: While this solution is conceptually straightforward, performance with large datasets may be a concern; see the bottom section for an optimization that is possible for certain inputs.
Explanation:
Group-Object
groups all inputs by equality.Sort-Object -Descending
sorts the resulting groups by member count in descending fashion (most frequently occurring inputs first).The
do ... while
statement loops over the sorted groups and outputs the input represented by each as long as the group-member and therefore occurrence count (frequency) is the highest, as implied by the first group's member count.
Better-performing solution, with strings and numbers:
If the input elements are uniformly simple numbers or strings (as opposed to complex objects), an optimization is possible:
Group-Object
's-NoElement
suppresses collecting the individual inputs in each group.Each group's
.Name
property reflects the grouping value, but does so as a string, so it must be converted back to its original data type.
# Sample dataset.
# Must be composed of all numbers or strings.
$dataset = 1, 2, 2, 3, 4, 4, 5
# Determine the data type of the elements of the dataset via its first element.
# All elements are assumed to be of the same type.
$type = $dataset[0].GetType()
# Group the same numbers and sort the groups by member count, highest counts first.
$groups = $dataset | Group-Object -NoElement | Sort-Object Count -Descending
# Output only the numbers represented by those groups that have
# the highest member count.
# -as $type converts the .Name string value back to the original type.
$i = 0
do { $groups[$i].Name -as $type } while ($groups[++$i].Count -eq $groups[0].Count)
Finding the statistical mode of a vector: When having more than single mode — return the last mode
The only option you have with collapse is sorting the data beforehand e.g.
library(collapse)
my_vec <- c(1, 1, 3, 4, 5, 5, -6, -6, 2, 2)
data.frame(v = my_vec, g = gl(2, 5)) %>%
roworder(g) %>%
tfm(t = data.table::rowid(g)) %>%
roworder(g, -t) %>%
gby(g) %>%
smr(last = fmode(v, ties = "first"))
The reason rev
doesn't work is because collapse grouping doesn't split the data, but only determines to which group a row belongs, and then computes statistics on all groups simultaneously using running algorithms in C++ (e.g. the grouped computation is done by fmode
itself). So in your code rev
is actually executed before the grouping and reverses the entire vector. In this case, probably a native data.table implementation calling fmode.default
directly (to optimize on method dispatch) would be the fastest solution. I can think about adding a "last"
mode if I find time for that.
How to find the statistical mode of each ID
The warnings that you are getting are suggesting two things.
You have not specified what
method
to choose so default method 'shorth' is used.It is suggesting that there is a tie in selection of Mode value.
Alternatively, why not use the Mode
function from here :
Mode <- function(x) {
ux <- unique(x)
ux[which.max(tabulate(match(x, ux)))]
}
To apply by group you can use it with dplyr
as :
library(dplyr)
data%>% group_by(id)%>% mutate(edema2= Mode(edema))
# id trt status stage spiders sex hepato edema ascites edema2
# <int> <int> <int> <int> <int> <fct> <int> <dbl> <int> <dbl>
#1 2 1 0 3 1 f 1 0 0 0
#2 2 1 0 3 1 f 1 0 0 0
#3 2 1 0 3 1 f 1 0 0 0
#4 3 1 2 4 0 m 0 0.5 0 0.5
#5 3 1 2 4 1 m 1 0 0 0.5
#6 3 1 2 4 0 m 0 0.5 0 0.5
R: how to find the mode of a vector
This post provides an elegant function to determine the mode so all you need to do is apply it to your data frame.
Mode <- function(x) {
ux <- unique(x)
ux[which.max(tabulate(match(x, ux)))]
}
apply(d, 2, Mode)
Yields:
MEMORY1 MEMORY2 MEMORY3 MEMORY4 MEMORY5 MEMORY6 MEMORY7 MEMORY8
5.5 5.5 4.5 1.5 4.5 5.5 4.5 5.5
How to find statistical mode in Fortran
But, how to do that? Or is there any intrinsic function in Fortran to calculate number of occurrence of input values and the value with highest occurrence.
No, there is not. You'll have to calculate the mode by hand.
The following code should work (on a sorted array):
FMODE = VAL(1)
COUNT = 1
CURRENTCOUNT = 1
DO I = 2, N
! We are going through the loop looking for values == VAL(I-1)...
IF (VAL(I) == VAL(I-1)) THEN
! We spotted another VAL(I-1), so increment the count.
CURRENTCOUNT = CURRENTCOUNT + 1
ELSE
! There are no more VAL(I-1)
IF (CURRENTCOUNT > COUNT) THEN
! There were more elements of value VAL(I-1) than of value FMODE
COUNT = CURRENTCOUNT
FMODE = VAL(I-1)
END IF
! Next we are looking for values == VAL(I), so far we have spotted one...
CURRENTCOUNT = 1
END
END DO
IF (CURRENTCOUNT > COUNT) THEN
! This means there are more elements of value VAL(N) than of value FMODE.
FMODE = VAL(N)
END IF
Explanation:
We keep the best-so-far mode in the FMODE
variable, and the count of the FMODE
in the COUNT
variable. As we step through the array we count the number of hits that are equal to what we are looking at now, in the CURRENTCOUNT
variable.
If the next item we look at is equal to the previous, we simply increment the CURRENTCOUNT
. If it's different, then we need to reset the CURRENTCOUNT
, because we will now count the number of duplications of the next element.
Before we reset the CURRENTCOUNT
we check if it's bigger than the previous best result, and if it is, we overwrite the previous best result (the FMODE
and COUNT
variables) with the new best results (whatever is at VAL(I)
and CURRENTCOUNT
), before we continue.
This reset doesn't happen at the end of the loop, so I inserted another check at the end in case the most frequent element happens to be the final element of the loop. In that case we overwrite FMODE
, like we would have done in the loop.
How to calculate the statistical mode in Processing / Arduino
I ported the code from your linked post to Processing, but it's limited to int arrays.
I hope that helps.
void setup()
{
int[] numbers = {1, 2, 3, 2, 1, 1, 1, 3, 4, 5, 2};
println(mode(numbers));
}
int mode(int[] array) {
int[] modeMap = new int [array.length];
int maxEl = array[0];
int maxCount = 1;
for (int i = 0; i < array.length; i++) {
int el = array[i];
if (modeMap[el] == 0) {
modeMap[el] = 1;
}
else {
modeMap[el]++;
}
if (modeMap[el] > maxCount) {
maxEl = el;
maxCount = modeMap[el];
}
}
return maxEl;
}
Related Topics
Calculate Max Value Across Multiple Columns by Multiple Groups
Deleting Rows in R Based on Values Over Multiple Columns
Ggplot2 Stacked Bar Chart - Each Bar Being 100% and With Percenage Labels Inside Each Bar
Change R Default Library Path Using .Libpaths in Rprofile.Site Fails to Work
Pass a String as Variable Name in Dplyr::Filter
Add Legend to Ggplot2 Line Plot
Split Column At Delimiter in Data Frame
How to Create a Lag Variable Within Each Group
Selecting Data Frame Rows Based on Partial String Match in a Column
Order Data Frame Rows According to Vector With Specific Order
R: Pulling Data from One Column to Create New Columns
Remove Ids With Fewer Than 9 Unique Observations
Replace Column Values With Na Based on a Different Column or Row Position With Tidyverse
Join 3 Columns of Different Lengths in R
Installing Rgl on Ubuntu and Mac: X11 Not Found
How to Find the Statistical Mode
How to Debug "Contrasts Can Be Applied Only to Factors With 2 or More Levels" Error
How to Deal With "Package 'Xxx' Is Not Available (For R Version X.Y.Z)" Warning