Create Frequency Tables for Multiple Factor Columns in R

Create frequency tables for multiple factor columns in R

You were nearly there. Just one small change in your function would have got you there. The x in function(x) ... needs to be passed through to the table() call:

levs <- c("Not Impt at all", "Somewhat Impt", "Neutral", "Impt", "Very Impt")
sapply(x.sample, function(x) table(factor(x, levels=levs, ordered=TRUE)))

A little re-jig of the code might make it a bit easier to read too:

sapply(lapply(x.sample,factor,levels=levs,ordered=TRUE), table)

# Q9_A Q9_B Q9_C
#Not Impt at all 3 4 4
#Somewhat Impt 0 0 0
#Neutral 0 0 0
#Impt 1 0 0
#Very Impt 6 6 6

Producing multiple frequency tables at once in R

You could write a function to take action based on it's class. Here, we calculate mean if class of the column is numeric or else perform count of unique values in the column.

library(dplyr)

purrr::map(names(df)[-1], function(x) {
if(is.numeric(df[[x]])) df %>% summarise(mean = mean(.data[[x]]))
else df %>% count(.data[[x]])
})

#[[1]]
# mean
#1 40.5

#[[2]]
# Car n
#1 Rel 1
#2 Yat 2
#3 Zum 3

#[[3]]
# Side n
#1 Left 3
#2 Right 3

How to Create Multiple Frequency Tables with Percentages Across Factor Variables using Purrr::map

You can use an anonymous function or a formula to get your first option to work. Here's the formula option.

happy %>% 
select_if(is.factor) %>%
map(~round(prop.table(table(.x)), 2))

In your second option, removing the NA values and then removing the count variable prior to spreading helps. The order in the result has changed, however.

TABLE = happy %>%  
select_if(is.factor) %>%
gather() %>%
filter(!is.na(value)) %>%
group_by(key, value) %>%
summarise(count = n()) %>%
mutate(perc = round(count/sum(count), 2), count = NULL)

TABLE %>%
split(.$key) %>%
map(~spread(.x, value, perc))

Report frequency for multiple variables in a dataframe in R

We can write a nested pair of functions to map count to multiple variables and row-bind the results, using a little tidy evaluation:

library(tidyverse)

count_multi <- function(.data, ...) {
count_var <- function(var, .data) {
.data %>%
count(Value = as.character({{ var }})) %>% # coerce to character to
mutate( # allow multiple var types
Variable = as.character(ensym(var)),
.before = everything()
)
}
map_dfr(enquos(...), count_var, .data = .data)
}

mtcars %>%
count_multi(cyl, gear)

Output:

  Variable Value  n
1 cyl 4 11
2 cyl 6 7
3 cyl 8 14
4 gear 3 15
5 gear 4 12
6 gear 5 5

I believe you can use kableExtra::pack_rows() to create subheaders for each Variable in markdown.

How do you make a multiple variable frequency table in R when not all values are present in all columns?

I'll go ahead and answer, though I still object to the lack of criteria. If we think of "tidy" as the opposite of "messy", then we should first tidy the input data into a long format. Then we can do a two-way table:

library(tidyr)
df %>% gather %>%
with(table(value, key))
# key
# value aa bb cc
# 7 1 1 2
# 8 2 1 2
# 9 1 2 0

Thanks to Markus for a base R version:

table(stack(df))
# ind
# values aa bb cc
# 7 1 1 2
# 8 2 1 2
# 9 1 2 0

freq table for multiple variables in r

You can try this:

List <- list()
for(i in 2:dim(df1)[2])
{
List[[i-1]] <- table(df1$cat, df1[,i])
}

[[1]]

0 1
1 3 1
2 3 2
3 3 2
4 2 2

[[2]]

0 1
1 1 3
2 3 2
3 2 3
4 3 1

[[3]]

0 1
1 3 1
2 3 2
3 2 3
4 2 2

Use R to create a large multiple column frequency table

EDIT: This is a revised answer offered after discussing the problem with the original poster. An older answer that does not solve the problem at hand is retained below for posterity.

This answer is not short nor concise, and I do hope there is a cleaner way. But the following will work:

## generate example data
set.seed(1)
death<-runif(1000)<=.75
ICU<-runif(1000)<=.63
serum<-runif(1000)<=.80
urine<-runif(1000)<=.77
brain<-runif(1000)<=.92
kidney<-runif(1000)<=.22
df<-as.data.frame(cbind((1:1000),death,ICU,serum,urine,brain,kidney))

## load up our data manipulation workhorses
library(reshape2)
library(plyr)

## save typing by saving row and column var names
row.vars <- c("serum", "urine", "brain", "kidney")
col.vars <- c("death", "ICU")

## melt data so we have death/icu in a column
dat.m <- melt(df, measure.vars = row.vars)

## get rid of rows with death==0 and ICU==0
dat.m <- dat.m[dat.m$value == 1, ]

## for each of death and icu calculate proportion of 1's
tab <- ddply(dat.m, "variable", function(DF) {
colwise(function(x) length(x[x==1]))(DF[col.vars])
})

## calculate overall proportions for row and column vars
row.nums <- sapply(df[row.vars], function(x) length(x[x==1]))
col.nums <- sapply(df[col.vars], function(x) length(x[x==1]))

## paste row and column counts into row and column names
rownames(tab) <- paste(tab$variable, " (N=", row.nums, ")", sep="")
tab$variable <- NULL
colnames(tab) <- paste(names(tab), " (N=", col.nums, ")", sep="")

## calculate cell proportions and paste them in one column at a time
tab[[1]] <- paste(tab[[1]],
" (",
round(100*(tab[[1]]/col.nums[[1]]), digits=2),
"%)",
sep="")
tab[[2]] <- paste(tab[[2]],
" (",
round(100*(tab[[2]]/col.nums[[2]]),
digits=2),
"%)",
sep="")

Now we can

## behold the fruits of our labor
tab
death (N=752) ICU (N=632)
serum (N=806) 602 (80.05%) 511 (80.85%)
urine (N=739) 556 (73.94%) 462 (73.1%)
brain (N=910) 684 (90.96%) 576 (91.14%)
kidney (N=190) 141 (18.75%) 128 (20.25%)

OLD ANSWER (does not solve problem at hand, but may be useful for related tasks)

This is one of those things that seems like it should be easy, but somehow isn't.

There is an existing question that addresses this once you have two columns ready to tabulate. That part is easy:

# function to genderate example data
mkdat <- function() factor(sample(letters[1:4], 10, replace=TRUE), levels=letters[1:4])

# make example data
set.seed(10)
dat <- data.frame(id = 1:10, var1 = mkdat(), var2=mkdat(), var3=mkdat())

# use reshape2 package to reshape from wide to long form
library(reshape2)
dat.m <- melt(dat, id.vars="id")
dat.m$value <- factor(dat.m$value)

Now the cross tab of dat.m$variable and dat.m$value give the correct cells. You can refer to the linked question above on how to proceed from there to get both counts and percents in a table, or you can use this method:

# tabulate
library(plyr)
tab <- ddply(dat.m, "variable",
function(DF) {
# get counts with table
count <- table(DF$value)
# convert counts to percent
prop <- paste(prop.table(count)*100, "%", sep="")
# combine count and percent
cp <- paste(count, " (", prop, ")", sep="")
# re-attach the names
names(cp) <- levels(DF$value)
return(cp)
})

# get row n
tab.r <- table(dat.m$variable)
# get column n
tab.c <- table(dat.m$value)
# paste row and column n into row and column names
colnames(tab) <- paste(colnames(tab), " (n = ", tab.c, ")", sep="")
rownames(tab) <- paste(tab$variable, " (n = ", tab.r, ")", sep="")
tab$variable <- NULL

# works, but that was way too much effort.
print(tab)

It has to be admitted that this is a lot of work for a simple count-and-proportion table. I'll be delighted if someone comes along with a simpler way to do it.



Related Topics



Leave a reply



Submit