How to generate a frequency table in R with with cumulative frequency and relative frequency
You're close! There are a few functions that will make this easy for you, namely cumsum()
and prop.table()
. Here's how I'd probably put this together. I make some random data, but the point is the same:
#Fake data
x <- sample(10:20, 44, TRUE)
#Your code
factorx <- factor(cut(x, breaks=nclass.Sturges(x)))
#Tabulate and turn into data.frame
xout <- as.data.frame(table(factorx))
#Add cumFreq and proportions
xout <- transform(xout, cumFreq = cumsum(Freq), relative = prop.table(Freq))
#-----
factorx Freq cumFreq relative
1 (9.99,11.4] 11 11 0.25000000
2 (11.4,12.9] 3 14 0.06818182
3 (12.9,14.3] 11 25 0.25000000
4 (14.3,15.7] 2 27 0.04545455
5 (15.7,17.1] 6 33 0.13636364
6 (17.1,18.6] 3 36 0.06818182
7 (18.6,20] 8 44 0.18181818
How to get table in R, including count, relative frequencies, and cumulative frequencies?
I don't agree with your claims about undergrads not being able to understand. I don't want to get this question into a teaching strategies and whether you should be using R if you don't believe it's proper for the level of your course.
You can supply them with this function, which they don't have to understand (the same way they don't have to understand the one from STATA).
library(dplyr)
tab <- function(dataset, var){
dataset %>%
# embrace var to be able to call it with any grouping factor
group_by({{var}}) %>%
summarise(n=n()) %>%
mutate(totalN = cumsum(n),
percent = n / sum(n),
cumpercent = cumsum(n / sum(n)))
}
Then (provided you source("tab.R")
), here's your one liner:
tab(dataset, var1)
# A tibble: 3 x 5
var1 n totalN percent cumpercent
<chr> <int> <int> <dbl> <dbl>
1 1 1 1 0.333 0.333
2 2 1 2 0.333 0.667
3 3 1 3 0.333 1
You can try tab(dataset, var2)
. Please note that this answer will only group by one factor (this was your question).
EDIT
one needs to understand how to set the working directory (etc.)
Not entirely true, if you are using Rstudio, you can manually import a dataset with clicks from a folder.
If you want to teach stats using R (which I think you definitely should), you should have at least one class of minimal things (yes, that includes working directory, how to call library(...)
and basic functions). There are a huge amount of resources (books, YouTube tutorials) you can assign as homewokrs/part of the class, so students become familiar.
The argument of WHATEVER SOFTWARE IS EASIER is weak if we drop all assumptions, I would need to know how where to click for the specific version of whatever software...
Calculating absolute, relative, and cumulative frequencies in R
I'm not sure why you wish to use hist(x)
. Everything can be obtained using table
:
# Absolute frequencies
table(x)
# x
# 0.69 0.88 1.02 1.09 1.18 1.19 1.32 1.42 1.53 1.69
# 1 1 1 1 1 1 1 1 1 1
# Relative frequencies
table(x) / length(x)
# x
# 0.69 0.88 1.02 1.09 1.18 1.19 1.32 1.42 1.53 1.69
# 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1
# Cumulative frequencies
cumsum(table(x))
# 0.69 0.88 1.02 1.09 1.18 1.19 1.32 1.42 1.53 1.69
# 1 2 3 4 5 6 7 8 9 10
and the same for y
. As to put them together,
rbind(Absolute = table(x),
Relative = table(x) / length(x),
Cumulative = cumsum(table(x)))
# 0.69 0.88 1.02 1.09 1.18 1.19 1.32 1.42 1.53 1.69
# Absolute 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
# Relative 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1
# Cumulative 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 10.0
The results are correct, although indeed somewhat boring. If you have more data, with repetitions, it will look better.
Frequency Distribution Table
I don't know your exact application but it seems unnecessary to show the data multiple times for each repeated x value. If this is not needed, you can avoid the merge
x <- c(1,2,3,2,4,2,5,4,6,7,8,9)
Freq <- table(x)
relFreq <- prop.table(Freq)
Cumulative_Freq <- cumsum(Freq)
Cumulative_Relative_Freq <- cumsum(relFreq)
data.frame(xval = names(Freq), Freq=Freq, relFreq=relFreq,
Cumulative_Freq=Cumulative_Freq,
Cumulative_Relative_Freq=Cumulative_Relative_Freq)
Another way to accomplish the same thing:
require(plyr)
x <- c(1,2,3,2,4,2,5,4,6,7,8,9)
z <- data.frame(table(x))
mutate(z, relFreq = prop.table(Freq), Cumulative_Freq = cumsum(Freq),
Cumulative_Relative_Freq = cumsum(relFreq))
Convert Data table to Frequency table correctly with weights
I get a more complex result (a list with three elements) from the first code. Furthermore there is a column: "% Total" which appears to already have what you are requesting. Perhaps you are using an out-of-date version of the package?
Related Topics
How to Request an Early Exit When Knitting an Rmd Document
Filter Out Rows from One Data.Frame That Are Present in Another Data.Frame
Texture in Barplot for 7 Bars in R
Sort a Factor Based on Value in One or More Other Columns
Factor Order Within Faceted Dotplot Using Ggplot2
How to Screenshot a Website Using R
Order and Color of Bars in Ggplot2 Barplot
How to Convert Mm:Ss.00 to Seconds.00
Add One Column Below Another in a Data.Frame in R
How to Convert Entire Dataframe to Numeric While Preserving Decimals
Multiple Lines for Text Per Legend Label in Ggplot2
Change Color of Only One Bar in Ggplot
Replace Na with Zero in Dplyr Without Using List()
Distance of Point Feature to Nearest Polygon in R