Simple Frequency Tables Using Data.Table

Simple frequency tables using data.table

data.table has a couple of symbols that can be used within the j expression. Notably

  • .N will give you the number of number of rows in each group.

see ?data.table under the details for by

Advanced: When grouping by by or by i, symbols .SD, .BY and .N may be used in the j expression, defined as follows.

....

.N is an integer, length 1, containing the number of rows in the group.

For example:

dt[, .N ,by = Species]

Species N
1: setosa 50
2: versicolor 50
3: virginica 50

R frequency tables using a sequence over a collection

Use factor with levels in table.

table(factor(x, levels = 3:15))

# 3 4 5 6 7 8 9 10 11 12 13 14 15
# 1 1 1 0 2 0 3 4 7 10 14 3 2

Or for a general case :

table(factor(x, levels = min(x):max(x)))

Create frequency tables using group_by function

Get the data in long format, count occurrence of each value for unique value of Indicator and cast the data to wide format.

library(dplyr)
library(tidyr)

df %>%
pivot_longer(cols = starts_with('Answer')) %>%
count(Indicator, value) %>%
pivot_wider(names_from = value, values_from = n, values_fill = 0)

# Indicator Correct Wrong Partial
# <int> <int> <int> <int>
#1 0 3 3 0
#2 1 1 4 1

You can make code a little shorter using janitor::tabyl.

df %>%
pivot_longer(cols = starts_with('Answer')) %>%
janitor::tabyl(Indicator, value)

data

df <- structure(list(Indicator = c(0L, 1L, 1L, 0L), Answer_1 = c("Correct", 
"Partial", "Wrong", "Correct"), Answer_2 = c("Wrong", "Correct",
"Wrong", "Correct"), Answer_3 = c("Wrong", "Wrong", "Wrong",
"Wrong")), class = "data.frame", row.names = c(NA, -4L))

Frequency table when there are multiple columns representing one value (R)

You can use tidyverse package to transform the data into a long format and then just summarise the desired stats.

library(tidyverse)

df |>
# Transform all columns into a long format
pivot_longer(cols = -ID,
names_pattern = "([A-z]+)",
names_to = c("variable")) |>
# Drop NA entries
drop_na(value) |>
# Group by variable
group_by(variable) |>
# Count
count(value) |>
# Calculate percentage as n / sum of n by variable
mutate(perc = 100* n / sum(n))

# A tibble: 10 x 4
# Groups: variable [3]
# variable value n perc
# <chr> <chr> <int> <dbl>
# 1 color blue 3 27.3
# 2 color green 2 18.2
# 3 color red 2 18.2
# 4 color yellow 4 36.4
# 5 shape circle 5 50
# 6 shape square 2 20
# 7 shape triangle 3 30
# 8 size large 2 33.3
# 9 size medium 2 33.3
#10 size small 2 33.3

I'm trying to make a frequency table where Var1 only uses one value and Var3 does not appear in the table but filters data in the table

If we assume that your dataset is as you presented it:

myDF <- data.frame(ISCED=c(12, 12, 12, 13, 15, 15), EMTAK=c(233, 245, 233, 233, 433, 245), PK_T=c(1, 0, NA, 1, 1, 0))

Then using only the dplyr package and table() from baseR you can do:

install.packages("dplyr")
library(dplyr)

newDF <- myDF %>%
na.omit %>% # remove NA values
filter(PK_T==1) %>%
filter(ISCED==12 | ISCED==15) %>%
select(EMTAK) %>% # drop the column ISCED
mutate(Freq= table(.$EMTAK)) # mutate() creates the new column "Freq"

Giving:

  EMTAK      Freq
1 233 1
2 433 1

If you want to do other combinations then you adjust the arguments inside the filter()



Related Topics



Leave a reply



Submit