How to Find the Statistical Mode

How to find the statistical mode?

One more solution, which works for both numeric & character/factor data:

Mode <- function(x) {
ux <- unique(x)
ux[which.max(tabulate(match(x, ux)))]
}

On my dinky little machine, that can generate & find the mode of a 10M-integer vector in about half a second.

If your data set might have multiple modes, the above solution takes the same approach as which.max, and returns the first-appearing value of the set of modes. To return all modes, use this variant (from @digEmAll in the comments):

Modes <- function(x) {
ux <- unique(x)
tab <- tabulate(match(x, ux))
ux[tab == max(tab)]
}

Find the statistical mode(s) of a dataset in PowerShell

Use a combination of Group-Object, Sort-Object, and a do ... while loop:

# Sample dataset.
$dataset = 1, 2, 2, 3, 4, 4, 5

# Group the same numbers and sort the groups by member count, highest counts first.
$groups = $dataset | Group-Object | Sort-Object Count -Descending

# Output only the numbers represented by those groups that have
# the highest member count.
$i = 0
do { $groups[$i].Group[0] } while ($groups[++$i].Count -eq $groups[0].Count)

The above yields 2 and 4, which are the two modes (values occurring most frequently, twice each in this case), sorted in ascending order (because Group-Object sorts by the grouping criterion and Sort-Object's sorting algorithm is stable).

Note: While this solution is conceptually straightforward, performance with large datasets may be a concern; see the bottom section for an optimization that is possible for certain inputs.

Explanation:

  • Group-Object groups all inputs by equality.

  • Sort-Object -Descending sorts the resulting groups by member count in descending fashion (most frequently occurring inputs first).

  • The do ... while statement loops over the sorted groups and outputs the input represented by each as long as the group-member and therefore occurrence count (frequency) is the highest, as implied by the first group's member count.


Better-performing solution, with strings and numbers:

If the input elements are uniformly simple numbers or strings (as opposed to complex objects), an optimization is possible:

  • Group-Object's -NoElement suppresses collecting the individual inputs in each group.

  • Each group's .Name property reflects the grouping value, but does so as a string, so it must be converted back to its original data type.

# Sample dataset.
# Must be composed of all numbers or strings.
$dataset = 1, 2, 2, 3, 4, 4, 5

# Determine the data type of the elements of the dataset via its first element.
# All elements are assumed to be of the same type.
$type = $dataset[0].GetType()

# Group the same numbers and sort the groups by member count, highest counts first.
$groups = $dataset | Group-Object -NoElement | Sort-Object Count -Descending

# Output only the numbers represented by those groups that have
# the highest member count.
# -as $type converts the .Name string value back to the original type.
$i = 0
do { $groups[$i].Name -as $type } while ($groups[++$i].Count -eq $groups[0].Count)

Finding the statistical mode of a vector: When having more than single mode — return the last mode

The only option you have with collapse is sorting the data beforehand e.g.

library(collapse)
my_vec <- c(1, 1, 3, 4, 5, 5, -6, -6, 2, 2)
data.frame(v = my_vec, g = gl(2, 5)) %>%
roworder(g) %>%
tfm(t = data.table::rowid(g)) %>%
roworder(g, -t) %>%
gby(g) %>%
smr(last = fmode(v, ties = "first"))

The reason revdoesn't work is because collapse grouping doesn't split the data, but only determines to which group a row belongs, and then computes statistics on all groups simultaneously using running algorithms in C++ (e.g. the grouped computation is done by fmode itself). So in your code rev is actually executed before the grouping and reverses the entire vector. In this case, probably a native data.table implementation calling fmode.defaultdirectly (to optimize on method dispatch) would be the fastest solution. I can think about adding a "last" mode if I find time for that.

How to find the statistical mode of each ID

The warnings that you are getting are suggesting two things.

  1. You have not specified what method to choose so default method 'shorth' is used.

  2. It is suggesting that there is a tie in selection of Mode value.

Alternatively, why not use the Mode function from here :

Mode <- function(x) {
ux <- unique(x)
ux[which.max(tabulate(match(x, ux)))]
}

To apply by group you can use it with dplyr as :

library(dplyr)
data%>% group_by(id)%>% mutate(edema2= Mode(edema))

# id trt status stage spiders sex hepato edema ascites edema2
# <int> <int> <int> <int> <int> <fct> <int> <dbl> <int> <dbl>
#1 2 1 0 3 1 f 1 0 0 0
#2 2 1 0 3 1 f 1 0 0 0
#3 2 1 0 3 1 f 1 0 0 0
#4 3 1 2 4 0 m 0 0.5 0 0.5
#5 3 1 2 4 1 m 1 0 0 0.5
#6 3 1 2 4 0 m 0 0.5 0 0.5

R: how to find the mode of a vector

This post provides an elegant function to determine the mode so all you need to do is apply it to your data frame.

Mode <- function(x) {
ux <- unique(x)
ux[which.max(tabulate(match(x, ux)))]
}

apply(d, 2, Mode)

Yields:

MEMORY1 MEMORY2 MEMORY3 MEMORY4 MEMORY5 MEMORY6 MEMORY7 MEMORY8 
5.5 5.5 4.5 1.5 4.5 5.5 4.5 5.5

How to find statistical mode in Fortran

But, how to do that? Or is there any intrinsic function in Fortran to calculate number of occurrence of input values and the value with highest occurrence.

No, there is not. You'll have to calculate the mode by hand.

The following code should work (on a sorted array):

FMODE = VAL(1)
COUNT = 1
CURRENTCOUNT = 1
DO I = 2, N
! We are going through the loop looking for values == VAL(I-1)...
IF (VAL(I) == VAL(I-1)) THEN
! We spotted another VAL(I-1), so increment the count.
CURRENTCOUNT = CURRENTCOUNT + 1
ELSE
! There are no more VAL(I-1)
IF (CURRENTCOUNT > COUNT) THEN
! There were more elements of value VAL(I-1) than of value FMODE
COUNT = CURRENTCOUNT
FMODE = VAL(I-1)
END IF
! Next we are looking for values == VAL(I), so far we have spotted one...
CURRENTCOUNT = 1
END
END DO
IF (CURRENTCOUNT > COUNT) THEN
! This means there are more elements of value VAL(N) than of value FMODE.
FMODE = VAL(N)
END IF

Explanation:

We keep the best-so-far mode in the FMODE variable, and the count of the FMODE in the COUNT variable. As we step through the array we count the number of hits that are equal to what we are looking at now, in the CURRENTCOUNT variable.

If the next item we look at is equal to the previous, we simply increment the CURRENTCOUNT. If it's different, then we need to reset the CURRENTCOUNT, because we will now count the number of duplications of the next element.

Before we reset the CURRENTCOUNT we check if it's bigger than the previous best result, and if it is, we overwrite the previous best result (the FMODE and COUNT variables) with the new best results (whatever is at VAL(I) and CURRENTCOUNT), before we continue.

This reset doesn't happen at the end of the loop, so I inserted another check at the end in case the most frequent element happens to be the final element of the loop. In that case we overwrite FMODE, like we would have done in the loop.

How to calculate the statistical mode in Processing / Arduino

I ported the code from your linked post to Processing, but it's limited to int arrays.
I hope that helps.

void setup()
{
int[] numbers = {1, 2, 3, 2, 1, 1, 1, 3, 4, 5, 2};
println(mode(numbers));
}


int mode(int[] array) {
int[] modeMap = new int [array.length];
int maxEl = array[0];
int maxCount = 1;

for (int i = 0; i < array.length; i++) {
int el = array[i];
if (modeMap[el] == 0) {
modeMap[el] = 1;
}
else {
modeMap[el]++;
}

if (modeMap[el] > maxCount) {
maxEl = el;
maxCount = modeMap[el];
}
}
return maxEl;
}


Related Topics



Leave a reply



Submit