Simple frequency tables using data.table
data.table
has a couple of symbols that can be used within the j
expression. Notably
.N
will give you the number of number of rows in each group.
see ?data.table
under the details for by
Advanced: When grouping by
by
or by i, symbols .SD, .BY and .N may be used in the j expression, defined as follows.....
.N is an integer, length 1, containing the number of rows in the group.
For example:
dt[, .N ,by = Species]
Species N
1: setosa 50
2: versicolor 50
3: virginica 50
R frequency tables using a sequence over a collection
Use factor
with levels
in table
.
table(factor(x, levels = 3:15))
# 3 4 5 6 7 8 9 10 11 12 13 14 15
# 1 1 1 0 2 0 3 4 7 10 14 3 2
Or for a general case :
table(factor(x, levels = min(x):max(x)))
Create frequency tables using group_by function
Get the data in long format, count occurrence of each value for unique value of Indicator
and cast the data to wide format.
library(dplyr)
library(tidyr)
df %>%
pivot_longer(cols = starts_with('Answer')) %>%
count(Indicator, value) %>%
pivot_wider(names_from = value, values_from = n, values_fill = 0)
# Indicator Correct Wrong Partial
# <int> <int> <int> <int>
#1 0 3 3 0
#2 1 1 4 1
You can make code a little shorter using janitor::tabyl
.
df %>%
pivot_longer(cols = starts_with('Answer')) %>%
janitor::tabyl(Indicator, value)
data
df <- structure(list(Indicator = c(0L, 1L, 1L, 0L), Answer_1 = c("Correct",
"Partial", "Wrong", "Correct"), Answer_2 = c("Wrong", "Correct",
"Wrong", "Correct"), Answer_3 = c("Wrong", "Wrong", "Wrong",
"Wrong")), class = "data.frame", row.names = c(NA, -4L))
Frequency table when there are multiple columns representing one value (R)
You can use tidyverse
package to transform the data into a long format and then just summarise the desired stats.
library(tidyverse)
df |>
# Transform all columns into a long format
pivot_longer(cols = -ID,
names_pattern = "([A-z]+)",
names_to = c("variable")) |>
# Drop NA entries
drop_na(value) |>
# Group by variable
group_by(variable) |>
# Count
count(value) |>
# Calculate percentage as n / sum of n by variable
mutate(perc = 100* n / sum(n))
# A tibble: 10 x 4
# Groups: variable [3]
# variable value n perc
# <chr> <chr> <int> <dbl>
# 1 color blue 3 27.3
# 2 color green 2 18.2
# 3 color red 2 18.2
# 4 color yellow 4 36.4
# 5 shape circle 5 50
# 6 shape square 2 20
# 7 shape triangle 3 30
# 8 size large 2 33.3
# 9 size medium 2 33.3
#10 size small 2 33.3
I'm trying to make a frequency table where Var1 only uses one value and Var3 does not appear in the table but filters data in the table
If we assume that your dataset is as you presented it:
myDF <- data.frame(ISCED=c(12, 12, 12, 13, 15, 15), EMTAK=c(233, 245, 233, 233, 433, 245), PK_T=c(1, 0, NA, 1, 1, 0))
Then using only the dplyr package and table() from baseR you can do:
install.packages("dplyr")
library(dplyr)
newDF <- myDF %>%
na.omit %>% # remove NA values
filter(PK_T==1) %>%
filter(ISCED==12 | ISCED==15) %>%
select(EMTAK) %>% # drop the column ISCED
mutate(Freq= table(.$EMTAK)) # mutate() creates the new column "Freq"
Giving:
EMTAK Freq
1 233 1
2 433 1
If you want to do other combinations then you adjust the arguments inside the filter()
Related Topics
Grouping & Visualizing Cumulative Features in R
Ggplot: How to Increase Spacing Between Faceted Plots
R Not Finding Package Even After Package Installation
Change Color of Leaflet Marker
Can Ggplot Theme Formatting Be Saved as an Object
Calculate Mean Across Rows with Na Values in R
Ggplot Year by Year Comparison
Use Fortran Subroutine in R? Undefined Symbol
Convert a Row of a Data Frame to Vector
Logistic Regression - Defining Reference Level in R
R Convert Between Zoo Object and Data Frame, Results Inconsistent for Different Numbers of Columns
Extract Knots, Basis, Coefficients and Predictions for P-Splines in Adaptive Smooth