## How to find the statistical mode?

One more solution, which works for both numeric & character/factor data:

`Mode <- function(x) {`

ux <- unique(x)

ux[which.max(tabulate(match(x, ux)))]

}

On my dinky little machine, that can generate & find the mode of a 10M-integer vector in about half a second.

If your data set might have multiple modes, the above solution takes the same approach as `which.max`

, and returns the *first-appearing* value of the set of modes. To return *all* modes, use this variant (from @digEmAll in the comments):

`Modes <- function(x) {`

ux <- unique(x)

tab <- tabulate(match(x, ux))

ux[tab == max(tab)]

}

## Find the statistical mode(s) of a dataset in PowerShell

Use a combination of `Group-Object`

, `Sort-Object`

, and a `do ... while`

loop:

`# Sample dataset.`

$dataset = 1, 2, 2, 3, 4, 4, 5

# Group the same numbers and sort the groups by member count, highest counts first.

$groups = $dataset | Group-Object | Sort-Object Count -Descending

# Output only the numbers represented by those groups that have

# the highest member count.

$i = 0

do { $groups[$i].Group[0] } while ($groups[++$i].Count -eq $groups[0].Count)

The above yields `2`

and `4`

, which are the two modes (values occurring most frequently, twice each in this case), sorted in ascending order (because `Group-Object`

sorts by the grouping criterion and `Sort-Object`

's sorting algorithm is stable).

Note: While this solution is conceptually straightforward, performance with large datasets may be a concern; see the bottom section for an optimization that is possible for certain inputs.

**Explanation:**

`Group-Object`

groups all inputs by equality.`Sort-Object -Descending`

sorts the resulting groups by member count in descending fashion (most frequently occurring inputs first).The

`do ... while`

statement loops over the sorted groups and outputs the input represented by each as long as the group-member and therefore occurrence count (frequency) is the highest, as implied by the first group's member count.

**Better-performing solution, with strings and numbers:**

If the input elements are uniformly *simple numbers or strings* (as opposed to complex objects), an optimization is possible:

`Group-Object`

's`-NoElement`

suppresses collecting the individual inputs in each group.Each group's

`.Name`

property reflects the grouping value, but does so as a*string*, so it must be converted back to its original data type.

`# Sample dataset.`

# Must be composed of all numbers or strings.

$dataset = 1, 2, 2, 3, 4, 4, 5

# Determine the data type of the elements of the dataset via its first element.

# All elements are assumed to be of the same type.

$type = $dataset[0].GetType()

# Group the same numbers and sort the groups by member count, highest counts first.

$groups = $dataset | Group-Object -NoElement | Sort-Object Count -Descending

# Output only the numbers represented by those groups that have

# the highest member count.

# -as $type converts the .Name string value back to the original type.

$i = 0

do { $groups[$i].Name -as $type } while ($groups[++$i].Count -eq $groups[0].Count)

## Finding the statistical mode of a vector: When having more than single mode — return the last mode

The only option you have with collapse is sorting the data beforehand e.g.

`library(collapse)`

my_vec <- c(1, 1, 3, 4, 5, 5, -6, -6, 2, 2)

data.frame(v = my_vec, g = gl(2, 5)) %>%

roworder(g) %>%

tfm(t = data.table::rowid(g)) %>%

roworder(g, -t) %>%

gby(g) %>%

smr(last = fmode(v, ties = "first"))

The reason `rev`

doesn't work is because collapse grouping doesn't split the data, but only determines to which group a row belongs, and then computes statistics on all groups simultaneously using running algorithms in C++ (e.g. the grouped computation is done by `fmode`

itself). So in your code `rev`

is actually executed before the grouping and reverses the entire vector. In this case, probably a native data.table implementation calling `fmode.default`

directly (to optimize on method dispatch) would be the fastest solution. I can think about adding a `"last"`

mode if I find time for that.

## How to find the statistical mode of each ID

The warnings that you are getting are suggesting two things.

You have not specified what

`method`

to choose so default method 'shorth' is used.It is suggesting that there is a tie in selection of Mode value.

Alternatively, why not use the `Mode`

function from here :

`Mode <- function(x) {`

ux <- unique(x)

ux[which.max(tabulate(match(x, ux)))]

}

To apply by group you can use it with `dplyr`

as :

`library(dplyr)`

data%>% group_by(id)%>% mutate(edema2= Mode(edema))

# id trt status stage spiders sex hepato edema ascites edema2

# <int> <int> <int> <int> <int> <fct> <int> <dbl> <int> <dbl>

#1 2 1 0 3 1 f 1 0 0 0

#2 2 1 0 3 1 f 1 0 0 0

#3 2 1 0 3 1 f 1 0 0 0

#4 3 1 2 4 0 m 0 0.5 0 0.5

#5 3 1 2 4 1 m 1 0 0 0.5

#6 3 1 2 4 0 m 0 0.5 0 0.5

## R: how to find the mode of a vector

This post provides an elegant function to determine the mode so all you need to do is apply it to your data frame.

`Mode <- function(x) {`

ux <- unique(x)

ux[which.max(tabulate(match(x, ux)))]

}

apply(d, 2, Mode)

Yields:

`MEMORY1 MEMORY2 MEMORY3 MEMORY4 MEMORY5 MEMORY6 MEMORY7 MEMORY8 `

5.5 5.5 4.5 1.5 4.5 5.5 4.5 5.5

## How to find statistical mode in Fortran

But, how to do that? Or is there any intrinsic function in Fortran to calculate number of occurrence of input values and the value with highest occurrence.

No, there is not. You'll have to calculate the mode by hand.

The following code should work (on a sorted array):

`FMODE = VAL(1)`

COUNT = 1

CURRENTCOUNT = 1

DO I = 2, N

! We are going through the loop looking for values == VAL(I-1)...

IF (VAL(I) == VAL(I-1)) THEN

! We spotted another VAL(I-1), so increment the count.

CURRENTCOUNT = CURRENTCOUNT + 1

ELSE

! There are no more VAL(I-1)

IF (CURRENTCOUNT > COUNT) THEN

! There were more elements of value VAL(I-1) than of value FMODE

COUNT = CURRENTCOUNT

FMODE = VAL(I-1)

END IF

! Next we are looking for values == VAL(I), so far we have spotted one...

CURRENTCOUNT = 1

END

END DO

IF (CURRENTCOUNT > COUNT) THEN

! This means there are more elements of value VAL(N) than of value FMODE.

FMODE = VAL(N)

END IF

Explanation:

We keep the best-so-far mode in the `FMODE`

variable, and the count of the `FMODE`

in the `COUNT`

variable. As we step through the array we count the number of hits that are equal to what we are looking at now, in the `CURRENTCOUNT`

variable.

If the next item we look at is equal to the previous, we simply increment the `CURRENTCOUNT`

. If it's different, then we need to reset the `CURRENTCOUNT`

, because we will now count the number of duplications of the next element.

Before we reset the `CURRENTCOUNT`

we check if it's bigger than the previous best result, and if it is, we overwrite the previous best result (the `FMODE`

and `COUNT`

variables) with the new best results (whatever is at `VAL(I)`

and `CURRENTCOUNT`

), before we continue.

This reset doesn't happen at the end of the loop, so I inserted another check at the end in case the most frequent element happens to be the final element of the loop. In that case we overwrite `FMODE`

, like we would have done in the loop.

## How to calculate the statistical mode in Processing / Arduino

I ported the code from your linked post to Processing, but it's limited to **int** arrays.

I hope that helps.

`void setup()`

{

int[] numbers = {1, 2, 3, 2, 1, 1, 1, 3, 4, 5, 2};

println(mode(numbers));

}

int mode(int[] array) {

int[] modeMap = new int [array.length];

int maxEl = array[0];

int maxCount = 1;

for (int i = 0; i < array.length; i++) {

int el = array[i];

if (modeMap[el] == 0) {

modeMap[el] = 1;

}

else {

modeMap[el]++;

}

if (modeMap[el] > maxCount) {

maxEl = el;

maxCount = modeMap[el];

}

}

return maxEl;

}

### Related Topics

Calculate Max Value Across Multiple Columns by Multiple Groups

Deleting Rows in R Based on Values Over Multiple Columns

Ggplot2 Stacked Bar Chart - Each Bar Being 100% and With Percenage Labels Inside Each Bar

Change R Default Library Path Using .Libpaths in Rprofile.Site Fails to Work

Pass a String as Variable Name in Dplyr::Filter

Add Legend to Ggplot2 Line Plot

Split Column At Delimiter in Data Frame

How to Create a Lag Variable Within Each Group

Selecting Data Frame Rows Based on Partial String Match in a Column

Order Data Frame Rows According to Vector With Specific Order

R: Pulling Data from One Column to Create New Columns

Remove Ids With Fewer Than 9 Unique Observations

Replace Column Values With Na Based on a Different Column or Row Position With Tidyverse

Join 3 Columns of Different Lengths in R

Installing Rgl on Ubuntu and Mac: X11 Not Found

How to Find the Statistical Mode

How to Debug "Contrasts Can Be Applied Only to Factors With 2 or More Levels" Error

How to Deal With "Package 'Xxx' Is Not Available (For R Version X.Y.Z)" Warning