Simple Frequency Tables Using Data.Table

Simple frequency tables using data.table

data.table has a couple of symbols that can be used within the j expression. Notably

.N will give you the number of number of rows in each group.

see ?data.table under the details for by

Advanced: When grouping by by or by i, symbols .SD, .BY and .N may be used in the j expression, defined as follows.

....

.N is an integer, length 1, containing the number of rows in the group.

For example:

dt[, .N ,by = Species]

     Species  N
1:     setosa 50
2: versicolor 50
3:  virginica 50

R frequency tables using a sequence over a collection

Use factor with levels in table.

table(factor(x, levels = 3:15))

# 3  4  5  6  7  8  9 10 11 12 13 14 15 
# 1  1  1  0  2  0  3  4  7 10 14  3  2

Or for a general case :

table(factor(x, levels = min(x):max(x)))

Create frequency tables using group_by function

Get the data in long format, count occurrence of each value for unique value of Indicator and cast the data to wide format.

library(dplyr)
library(tidyr)

df %>%
  pivot_longer(cols = starts_with('Answer')) %>%
  count(Indicator, value) %>%
  pivot_wider(names_from = value, values_from = n, values_fill = 0)

#  Indicator Correct Wrong Partial
#      <int>   <int> <int>   <int>
#1         0       3     3       0
#2         1       1     4       1

You can make code a little shorter using janitor::tabyl.

df %>%
  pivot_longer(cols = starts_with('Answer')) %>%
  janitor::tabyl(Indicator, value)

data

df <- structure(list(Indicator = c(0L, 1L, 1L, 0L), Answer_1 = c("Correct", 
"Partial", "Wrong", "Correct"), Answer_2 = c("Wrong", "Correct", 
"Wrong", "Correct"), Answer_3 = c("Wrong", "Wrong", "Wrong", 
"Wrong")), class = "data.frame", row.names = c(NA, -4L))

Frequency table when there are multiple columns representing one value (R)

You can use tidyverse package to transform the data into a long format and then just summarise the desired stats.

library(tidyverse)

df |> 
  # Transform all columns into a long format
  pivot_longer(cols = -ID,
               names_pattern = "([A-z]+)",
               names_to = c("variable")) |>
  # Drop NA entries
  drop_na(value) |>
  # Group by variable
  group_by(variable) |>
  # Count
  count(value) |>
  # Calculate percentage as n / sum of n by variable
  mutate(perc = 100* n / sum(n))

# A tibble: 10 x 4
# Groups:   variable [3]
#   variable value        n  perc
#   <chr>    <chr>    <int> <dbl>
# 1 color    blue         3  27.3
# 2 color    green        2  18.2
# 3 color    red          2  18.2
# 4 color    yellow       4  36.4
# 5 shape    circle       5  50  
# 6 shape    square       2  20  
# 7 shape    triangle     3  30  
# 8 size     large        2  33.3
# 9 size     medium       2  33.3
#10 size     small        2  33.3

I'm trying to make a frequency table where Var1 only uses one value and Var3 does not appear in the table but filters data in the table

If we assume that your dataset is as you presented it:

myDF <- data.frame(ISCED=c(12, 12, 12, 13, 15, 15), EMTAK=c(233, 245, 233, 233, 433, 245), PK_T=c(1, 0, NA, 1, 1, 0))

Then using only the dplyr package and table() from baseR you can do:

install.packages("dplyr")
library(dplyr)

newDF <- myDF %>%
  na.omit %>%                        # remove NA values
  filter(PK_T==1) %>%
  filter(ISCED==12 | ISCED==15) %>%
  select(EMTAK) %>%                  # drop the column ISCED
  mutate(Freq= table(.$EMTAK))       # mutate() creates the new column "Freq"

Giving:

  EMTAK      Freq
1   233         1
2   433         1

If you want to do other combinations then you adjust the arguments inside the filter()

Simple Frequency Tables Using Data.Table