How to Generate Bin Frequency Table in R

How to generate bin frequency table in R?

Regarding @akrun solution, I would post something usefull from the documentation ?cut, in case:

Note

Instead of
table(cut(x, br)), hist(x, br, plot = FALSE)
is more
efficient and less memory hungry.

So, in case of lots of data, I would rather opt for:

br = seq(0,1,by=0.1)

ranges = paste(head(br,-1), br[-1], sep=" - ")
freq = hist(x, breaks=br, include.lowest=TRUE, plot=FALSE)

data.frame(range = ranges, frequency = freq$counts)

# range frequency
#1 0 - 0.1 2
#2 0.1 - 0.2 1
#3 0.2 - 0.3 3
#4 0.3 - 0.4 5
#5 0.4 - 0.5 4
#6 0.5 - 0.6 1
#7 0.6 - 0.7 2
#8 0.7 - 0.8 0
#10 0.9 - 1 1

How to bin the summarised frequency table with dplyr

The most simple approach would be to use cut by creating groups using seq for every 100 values and sum the values for each group.

library(dplyr)

df %>%
group_by(group = cut(distance, breaks = seq(0, max(distance), 100))) %>%
summarise(n = sum(n))

# group n
# <fct> <int>
# 1 (0,100] 1633
# 2 (100,200] 21344
# 3 (200,300] 28310
# 4 (300,400] 7748
# 5 (400,500] 21292
# 6 (500,600] 26815
# 7 (600,700] 7846
# 8 (700,800] 48904
# 9 (800,900] 7574
#10 (900,1e+03] 18205
# ... with 17 more rows

which can be translated to base R using aggregate like

aggregate(n ~ distance, 
transform(df, distance = cut(distance, breaks = seq(0, max(distance), 100))), sum)

How to create bin frequency table where bin size varies by group

Here is a possible solution:

# example data
set.seed(42)
ID <- as.factor(c(rep("A",20),rep("B",22)))
date <- as.factor(c(rep("C",12),rep("D",8),rep("E",10),rep("F",12)))
group <- as.factor(c(rep("G",6),rep("H",6),rep("G",8),rep("G",6),rep("H",4),rep("G",6),rep("H",6)))
val <- round(rnorm(42,20,10),0)

df <- data.frame(ID,date,group,val)

# using the function you provided
f = function(br, df) {aggregate(val~ID+date+group,df,FUN=function(x) table(cut(x, br)))}

library(tidyverse)

# create a look up table
# (specify the breaks for each group)
look_up = data_frame(group_id = c("G","H"),
br = list(c(0,10,30,100), c(0,10,50,100)))

df_upd = df %>%
group_by(group_id = group) %>% # duplicate group column and group by it
nest() %>% # nest data
left_join(look_up, by="group_id") %>% # join look up table to get corresponding breaks
mutate(d = map2(br, data, ~f(.x, .y))) # apply function

# see results
df_upd$d

# [[1]]
# ID date group val.(0,10] val.(10,30] val.(30,100]
# 1 A C G 0 5 1
# 2 A D G 1 4 1
# 3 B E G 1 3 2
# 4 B F G 1 5 0
#
# [[2]]
# ID date group val.(0,10] val.(10,50] val.(50,100]
# 1 A C H 0 6 0
# 2 B E H 1 3 0
# 3 B F H 0 5 0

I've decided to use the function you provided, which obviously includes the breaks into the column names. For this reason, when you have different breaks for different groups, the output cannot be included in one data frame as there will be a column name conflict.

The only way to get everything in one data frame is if you change your function to produce a more "tidy" output:

library(tidyverse)

# updated function
f = function(br, df) {
df %>%
mutate(g = cut(val, br)) %>%
na.omit() %>%
count(g, ID, date, group) %>%
complete(g, nesting(ID, date, group), fill=list(n=0)) }

# same lookup table
look_up = data_frame(group_id = c("G","H"),
br = list(c(0,10,30,100), c(0,10,50,100)))

# apply your function
df %>%
group_by(group_id = group) %>%
nest() %>%
left_join(look_up, by="group_id") %>%
mutate(d = map2(br, data, ~f(.x, .y))) %>%
unnest(d) %>%
select(-group_id) %>%
arrange(group, date, ID) # for visualisation purposes only

# # A tibble: 21 x 5
# g ID date group n
# <chr> <fct> <fct> <fct> <dbl>
# 1 (0,10] A C G 0
# 2 (10,30] A C G 5
# 3 (30,100] A C G 1
# 4 (0,10] A D G 1
# 5 (10,30] A D G 4
# 6 (30,100] A D G 1
# 7 (0,10] B E G 1
# 8 (10,30] B E G 3
# 9 (30,100] B E G 2
# 10 (0,10] B F G 1
# # ... with 11 more rows

How to generate a frequency table in R with with cumulative frequency and relative frequency

You're close! There are a few functions that will make this easy for you, namely cumsum() and prop.table(). Here's how I'd probably put this together. I make some random data, but the point is the same:

#Fake data
x <- sample(10:20, 44, TRUE)
#Your code
factorx <- factor(cut(x, breaks=nclass.Sturges(x)))
#Tabulate and turn into data.frame
xout <- as.data.frame(table(factorx))
#Add cumFreq and proportions
xout <- transform(xout, cumFreq = cumsum(Freq), relative = prop.table(Freq))
#-----
factorx Freq cumFreq relative
1 (9.99,11.4] 11 11 0.25000000
2 (11.4,12.9] 3 14 0.06818182
3 (12.9,14.3] 11 25 0.25000000
4 (14.3,15.7] 2 27 0.04545455
5 (15.7,17.1] 6 33 0.13636364
6 (17.1,18.6] 3 36 0.06818182
7 (18.6,20] 8 44 0.18181818

How to create a count table in R?

First, you need to use cut to bin the lengths. Then you can use complete to fill the missing counts with 0. Then, group_by species, station and bin and use summarize to add the counts per group. Last, use pivot_wider to make the bins column labels.

Note 1: The result differs from your expected output, but I think you have a typo.

Note 2: I don't know if teh grouping and summing is necessary. In your example it's not, but logically I would include it.

library(tidyverse)

set.seed(10)
df <- data.frame(
species = c(rep("A",4), rep("B",4)),
station = rep(1:2, 4),
length = round(rnorm(8, 15, 2)),
count = round(rnorm(8, 5, 2))
)
df

#---------------------
df %>%
mutate(length = cut(length,
breaks = seq(10.5, 20.5, by = 2),
labels = c("L11_12", "L13_14", "L15_16", "L17_18", "L19_20"))) %>%
complete(species, station, length, fill = list(count = 0)) %>%
group_by(species, station, length) %>%
summarize(count = sum(count)) %>%
pivot_wider(names_from = length, values_from = count)

#---------------------
# A tibble: 4 x 7
# Groups: species, station [4]
species station L11_12 L13_14 L15_16 L17_18 L19_20
<chr> <int> <dbl> <dbl> <dbl> <dbl> <dbl>
1 A 1 7 0 2 0 0
2 A 2 0 7 4 0 0
3 B 1 0 6 5 0 0
4 B 2 0 5 7 0 0

Using R - frequency counts with variable binwidths and factors

The following snippet should do what you want:

I loaded your sample into df.

library("dplyr")
df %>% group_by(sample.type, leaf.side, canopy, treatment) %>%
dplyr::select(Feret) %>%
do(data.frame(table(cut(.$Feret, breaks=bins, include.lowest=T))))

I refer you to the dplyr documentation. In short, x %>% f is f(x) and x -> f(a) is f(x,a).

Note that dplyr::select is just select, but I have had namespace issue so many times that now I always specify the package.

table(cut(df$Feret, breaks=bins)) is just a nicer way to do what you did with hist. With cut, you create a factor variable (Remember to add include.lowest=T if your values can reach the lower bound) and with table, you count the frequency of each level.

This gives:

   sample.type leaf.side canopy treatment        Var1 Freq
1 flower upper top green (0.01,0.03] 0
2 flower upper top green (0.03,0.1] 6
3 flower upper top green (0.1,0.3] 1
4 flower upper top green (0.3,1] 0
5 flower upper top green (1,3] 1
6 flower upper top green (3,10] 3
7 flower upper top white (0.01,0.03] 4
8 flower upper top white (0.03,0.1] 4
9 flower upper top white (0.1,0.3] 0
10 flower upper top white (0.3,1] 0
11 flower upper top white (1,3] 0
12 flower upper top white (3,10] 3
13 leaf lower bottom white (0.01,0.03] 5
14 leaf lower bottom white (0.03,0.1] 4
15 leaf lower bottom white (0.1,0.3] 1
16 leaf lower bottom white (0.3,1] 1
17 leaf lower bottom white (1,3] 0
18 leaf lower bottom white (3,10] 0
19 leaf lower top grey (0.01,0.03] 10
20 leaf lower top grey (0.03,0.1] 1
21 leaf lower top grey (0.1,0.3] 0
22 leaf lower top grey (0.3,1] 0
23 leaf lower top grey (1,3] 0
24 leaf lower top grey (3,10] 0
25 leaf upper bottom white (0.01,0.03] 4
26 leaf upper bottom white (0.03,0.1] 6
27 leaf upper bottom white (0.1,0.3] 1
28 leaf upper bottom white (0.3,1] 0
29 leaf upper bottom white (1,3] 0
30 leaf upper bottom white (3,10] 0
31 leaf upper top blue (0.01,0.03] 10
32 leaf upper top blue (0.03,0.1] 0
33 leaf upper top blue (0.1,0.3] 0
34 leaf upper top blue (0.3,1] 0
35 leaf upper top blue (1,3] 1
36 leaf upper top blue (3,10] 0

(Actually, it doesn't print like this since this is a tbl, but you can use print.data.frame to print a tbl the old way.)

From here it should be straightforward to extract the info you want.



Related Topics



Leave a reply



Submit