to find most frequently occuring element in matrix in R
Set up some test data.
> (image = matrix(sample(1:10, 100, replace = TRUE), nrow = 10))
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 4 4 2 7 2 2 3 8 2 5
[2,] 7 3 2 6 6 5 7 8 1 3
[3,] 7 5 7 9 4 9 4 8 2 7
[4,] 5 3 4 2 1 5 9 10 9 5
[5,] 9 10 7 2 7 4 9 1 1 9
[6,] 2 3 5 1 2 8 1 5 9 4
[7,] 5 4 10 5 9 10 1 6 1 10
[8,] 6 3 9 7 1 1 9 2 1 7
[9,] 5 9 4 8 9 9 5 10 5 4
[10,] 10 1 4 7 3 2 3 5 4 5
Do it manually.
> table(image)
image
1 2 3 4 5 6 7 8 9 10
12 12 8 12 15 4 11 5 14 7
Here we can see that the value 5 appeared most often (15 times). To get the same results programmatically:
> which.max(table(image))
5
5
Most frequent element per column
It can be done in base like this:
sapply(1:ncol(example), function(x) rev(tail(names(sort(table(example[,x]))), 2)))
And if you want to know the frequencies then just ignore names()
:
sapply(1:ncol(example), function(x) rev(tail(sort(table(example[,x])), 2)))
Find the n most common values in a vector
I'm sure this is a duplicate, but the answer is simple:
sort(table(variable),decreasing=TRUE)[1:3]
Identify most frequent row from matrix
> (mat <- matrix(c(rep(c(0,1),3),rep(c(1,0),2)),5, byrow=TRUE))
[,1] [,2]
[1,] 0 1
[2,] 0 1
[3,] 0 1
[4,] 1 0
[5,] 1 0
do get the row:
while(anyDuplicated(mat)>0) {
mat <- mat[duplicated(mat),]
if (class(mat)=="numeric") break
}
Result:
> mat
[1] 0 1
Find the most frequent value by row
Something like :
apply(df,1,function(x) names(which.max(table(x))))
[1] "red" "yellow" "green"
In case there is a tie, which.max takes the first max value. From the
which.max help page :
Determines the location, i.e., index of the (first)
minimum or maximum of a numeric vector.
Ex :
var4 <- c("yellow","green","yellow")
df <- data.frame(cbind(id, var1, var2, var3, var4))
> df
id var1 var2 var3 var4
1 1 red red yellow yellow
2 2 yellow yellow orange green
3 3 green green green yellow
apply(df,1,function(x) names(which.max(table(x))))
[1] "red" "yellow" "green"
Most frequent element in matrix
Your algorithm does not work for the 1D matrix, but once fixed, it can be used for the 2D matrix unchanged:
#include <stdio.h>
double find_most_frequent(const double *arr, size_t vel) {
size_t i, j, maxcount = 0, mostfreq = 0;
for (i = 0; i < vel; i++) {
size_t count = 1;
for (j = i + 1; j < vel; j++) {
if (arr[i] == arr[j]) {
count++;
}
if (maxcount < count || (maxcount == count && arr[mostfreq] > arr[i])) {
maxcount = count;
mostfreq = i;
}
}
return arr[mostfreq];
}
#define ROWS 3
#define COLS 4
int main() {
double mat[ROWS][COLS];
double arr[] = { 1.3, 4.2, 1.3, 5, 4.2, 6.8, 3.7 };
for (int i = 0; i < ROWS; i++) {
for (int j = 0; j < COLS; j++) {
if (scanf("%lf", &mat[i][j]) != 1)
return 1;
}
}
printf("Most frequent in array is: %g\n", find_most_frequent(arr, sizeof(arr) / sizeof(arr[0]));
printf("Most frequent in matrix is: %g\n", find_most_frequent(&mat[0][0], sizeof(mat) / sizeof(mat[0][0]));
return 0;
}
How to calculate most frequent occurring terms/words in a document collection/corpus using R?
You can use
sorted.sums[sorted.sums > 5][1:4]
But if you have at least 4 values that are greater than 5 only using sorted.sums[1:4]
should work as well.
To get the words you can use names
.
names(sorted.sums[sorted.sums > 5][1:4])
Find the most frequent value in a column and take a subset of that
You can use subset
indx <- tail(names(sort(table(df1$Value))),1)
subset(df1, Value==indx)
Or using dplyr
library(dplyr)
df1 %>%
group_by(Value) %>%
mutate(N=n()) %>%
ungroup() %>%
filter(N==max(N))
Or
library(data.table)
setDT(df1)[, N:=.N, Value][N==max(N)][, N:=NULL]
Related Topics
Remove Quotes from a Character Vector in R
Loop Through Data Frame and Variable Names
Aggregate/Summarize Multiple Variables Per Group (E.G. Sum, Mean)
How to Import Multiple .Csv Files At Once
Apply Several Summary Functions on Several Variables by Group in One Call
Show Percent % Instead of Counts in Charts of Categorical Variables
Combine Legends For Color and Shape into a Single Legend
Create a Sequential Number (Counter) For Rows Within Each Group of a Dataframe
Pair-Wise Duplicate Removal from Dataframe
Split an Audio File into Pieces of an Arbitrary Size
Force R to Stop Plotting Abbreviated Axis Labels (Scientific Notation) - E.G. 1E+00
Select the Row With the Maximum Value in Each Group
Pass a Data.Frame Column Name to a Function
How to Convert Excel Date Format to Proper Date in R
Why Does Data.Table Update Names(Dt) by Reference, Even If I Assign to Another Variable