How to Generate a Frequency Table in R with With Cumulative Frequency and Relative Frequency

How to generate a frequency table in R with with cumulative frequency and relative frequency

You're close! There are a few functions that will make this easy for you, namely cumsum() and prop.table(). Here's how I'd probably put this together. I make some random data, but the point is the same:

#Fake data
x <- sample(10:20, 44, TRUE)
#Your code
factorx <- factor(cut(x, breaks=nclass.Sturges(x)))
#Tabulate and turn into data.frame
xout <- as.data.frame(table(factorx))
#Add cumFreq and proportions
xout <- transform(xout, cumFreq = cumsum(Freq), relative = prop.table(Freq))
#-----
factorx Freq cumFreq relative
1 (9.99,11.4] 11 11 0.25000000
2 (11.4,12.9] 3 14 0.06818182
3 (12.9,14.3] 11 25 0.25000000
4 (14.3,15.7] 2 27 0.04545455
5 (15.7,17.1] 6 33 0.13636364
6 (17.1,18.6] 3 36 0.06818182
7 (18.6,20] 8 44 0.18181818

How to get table in R, including count, relative frequencies, and cumulative frequencies?

I don't agree with your claims about undergrads not being able to understand. I don't want to get this question into a teaching strategies and whether you should be using R if you don't believe it's proper for the level of your course.

You can supply them with this function, which they don't have to understand (the same way they don't have to understand the one from STATA).

library(dplyr)
tab <- function(dataset, var){

dataset %>%
# embrace var to be able to call it with any grouping factor
group_by({{var}}) %>%
summarise(n=n()) %>%
mutate(totalN = cumsum(n),
percent = n / sum(n),
cumpercent = cumsum(n / sum(n)))

}

Then (provided you source("tab.R")), here's your one liner:

tab(dataset, var1)
# A tibble: 3 x 5
var1 n totalN percent cumpercent
<chr> <int> <int> <dbl> <dbl>
1 1 1 1 0.333 0.333
2 2 1 2 0.333 0.667
3 3 1 3 0.333 1

You can try tab(dataset, var2). Please note that this answer will only group by one factor (this was your question).

EDIT

one needs to understand how to set the working directory (etc.)

Not entirely true, if you are using Rstudio, you can manually import a dataset with clicks from a folder.
If you want to teach stats using R (which I think you definitely should), you should have at least one class of minimal things (yes, that includes working directory, how to call library(...) and basic functions). There are a huge amount of resources (books, YouTube tutorials) you can assign as homewokrs/part of the class, so students become familiar.
The argument of WHATEVER SOFTWARE IS EASIER is weak if we drop all assumptions, I would need to know how where to click for the specific version of whatever software...

Calculating absolute, relative, and cumulative frequencies in R

I'm not sure why you wish to use hist(x). Everything can be obtained using table:

# Absolute frequencies
table(x)
# x
# 0.69 0.88 1.02 1.09 1.18 1.19 1.32 1.42 1.53 1.69
# 1 1 1 1 1 1 1 1 1 1

# Relative frequencies
table(x) / length(x)
# x
# 0.69 0.88 1.02 1.09 1.18 1.19 1.32 1.42 1.53 1.69
# 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1

# Cumulative frequencies
cumsum(table(x))
# 0.69 0.88 1.02 1.09 1.18 1.19 1.32 1.42 1.53 1.69
# 1 2 3 4 5 6 7 8 9 10

and the same for y. As to put them together,

rbind(Absolute = table(x), 
Relative = table(x) / length(x),
Cumulative = cumsum(table(x)))
# 0.69 0.88 1.02 1.09 1.18 1.19 1.32 1.42 1.53 1.69
# Absolute 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
# Relative 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1
# Cumulative 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 10.0

The results are correct, although indeed somewhat boring. If you have more data, with repetitions, it will look better.

Frequency Distribution Table

I don't know your exact application but it seems unnecessary to show the data multiple times for each repeated x value. If this is not needed, you can avoid the merge

x <- c(1,2,3,2,4,2,5,4,6,7,8,9)
Freq <- table(x)
relFreq <- prop.table(Freq)
Cumulative_Freq <- cumsum(Freq)
Cumulative_Relative_Freq <- cumsum(relFreq)
data.frame(xval = names(Freq), Freq=Freq, relFreq=relFreq,
Cumulative_Freq=Cumulative_Freq,
Cumulative_Relative_Freq=Cumulative_Relative_Freq)

Another way to accomplish the same thing:

require(plyr)
x <- c(1,2,3,2,4,2,5,4,6,7,8,9)
z <- data.frame(table(x))
mutate(z, relFreq = prop.table(Freq), Cumulative_Freq = cumsum(Freq),
Cumulative_Relative_Freq = cumsum(relFreq))

Convert Data table to Frequency table correctly with weights

I get a more complex result (a list with three elements) from the first code. Furthermore there is a column: "% Total" which appears to already have what you are requesting. Perhaps you are using an out-of-date version of the package?



Related Topics



Leave a reply



Submit