Calculations proportions of a three way table
Is this what you are chasing?
The data:
test <- structure(c(42L, 151L, 313L, 69L, 22L, 46L, 47L, 24L, 17L, 36L, 108L, 16L), .Dim = c(2L, 2L, 3L), .Dimnames = list(c("0", "1" ), c("female", "male"), c("adult", "child", "unknown")), class = "table")
Get the column percentages (2
) within each sub-table (3
):
prop.table(test,c(2,3))
Results in:
, , adult
female male
0 0.2176166 0.8193717
1 0.7823834 0.1806283
, , child
female male
0 0.3235294 0.6619718
1 0.6764706 0.3380282
, , unknown
female male
0 0.3207547 0.8709677
1 0.6792453 0.1290323
How can I create a prop.table from a dataset with multiple variables?
With sapply
and tapply
you can do :
cols <- 4:10
t(sapply(df[cols], function(x) tapply(x, df$poverty_t, sum)/sum(x)))
# -1 0 1 2
#n_fem 0.3333333 0.3333333 0.2500000 0.08333333
#n_male 0.5000000 0.2000000 0.3000000 0.00000000
#n_Sec_Edu 0.0000000 0.3333333 0.6666667 0.00000000
#n_High_Edu 0.5000000 0.3750000 0.1250000 0.00000000
#n_emp 0.3333333 0.4444444 0.2222222 0.00000000
#n_noemp 0.3333333 0.0000000 0.3333333 0.33333333
#n_stud 0.4166667 0.1666667 0.3333333 0.08333333
Working with Three-Dimensional Frequency Tables in R
You are very close, but you need to sum over 2 margins. I'm re-arranging your example so the "vote" is at the end as in your original question:
> tab <- xtabs(~cyl+gear+I(mpg>20), mtcars)
> prop.table(tab, 1:2)
, , I(mpg > 20) = FALSE
gear
cyl 3 4 5
4 0.0 0.0 0.0
6 0.5 0.5 1.0
8 1.0 1.0
, , I(mpg > 20) = TRUE
gear
cyl 3 4 5
4 1.0 1.0 1.0
6 0.5 0.5 0.0
8 0.0 0.0
> prop.table(tab, 1:2)[ , , 2] # Proportion TRUE for each combo
gear
cyl 3 4 5
4 1.0 1.0 1
6 0.5 0.5 0
8 0.0 NaN 0
All 4 cylinder cars get over 20mpg and no 8 cylinder cars do. To get a data frame:
> as.data.frame.table(prop.table(tab, 1:2)[ , , 2])
cyl gear Freq
1 4 3 1.0
2 6 3 0.5
3 8 3 0.0
4 4 4 1.0
5 6 4 0.5
6 8 4 NaN
7 4 5 1.0
8 6 5 0.0
9 8 5 0.0
s3 is there a way to combine prop.table for character variables?
Maybe this is what you wanted. Values in each category sum up to 100%.
lis <- sapply( Student1995, function(x) t( sapply( x, table ) ) )
sapply( lis, function(x) colSums(prop.table(x)) )
$alcohol
Not Once.or.Twice.a.week Once.a.month
0.0 0.6 0.4
Once.a.week More.than.once.a.week
0.0 0.0
$drugs
Not Tried.once Occasional Regular
0.8 0.2 0.0 0.0
$smoke
Not Occasional Regular
0.6 0.2 0.2
$sport
Not.regular Regular
0.4 0.6
and the whole summary...
prop.table( table(as.vector( sapply( Student1995, unlist ))) )
Not Not regular Occasional
0.35 0.10 0.05
Once a month Once or Twice a week Regular
0.10 0.15 0.20
Tried once
0.05
Create proportion matrix for multivariate categorical data
Tried to put logic at appropriate points in code sequence.
props <- data.frame(Cat = sort(unique(c(data))) ) # Just the Cat column
#Now fill in the entries
# the entries will be obtained with table function
apply(data, 2, table) # run `table(.)` over the columns individually
[[1]]
0 1 2 # these are actually character valued names
3 4 3 # while these are the count values
[[2]]
2 3 6 8 10
2 4 1 2 1
[[3]]
2 3 4 5 6
2 3 1 3 1
Now iterate over that list to fill in values that match the Cat column:
props2 <- cbind(props, # using dfrm first argument returns dataframe object
lapply( apply(data, 2, table) , # irregular results are a list
function(col) { # first make a named vector of zeros
x <- setNames(rep(0,length(props$Cat)), props$Cat)
# could have skipped that step by using `tabulate`
# then fill with values using names as indices
x[names(col)] <- col # values to matching names
x}) )
props2
#-------------
Cat V1 V2 V3
0 0 3 0 0
1 1 4 0 0
2 2 3 2 2
3 3 0 4 3
4 4 0 0 1
5 5 0 0 3
6 6 0 1 1
8 8 0 2 0
10 10 0 1 0
#---
# now just "proportionalize" those counts
props2[2:4] <- prop.table(data.matrix(props2[2:4]), margin=2)
props2
#-------------
Cat V1 V2 V3
0 0 0.3 0.0 0.0
1 1 0.4 0.0 0.0
2 2 0.3 0.2 0.2
3 3 0.0 0.4 0.3
4 4 0.0 0.0 0.1
5 5 0.0 0.0 0.3
6 6 0.0 0.1 0.1
8 8 0.0 0.2 0.0
10 10 0.0 0.1 0.0
How to Calculate the percent difference in a three dimensional array using R apply()
You should always provide reproducible data. Printouts can leave out important details regarding the structure. Use dput(HEC1)
and copy the results into your question like this:
HEC1 <- structure(c(32L, 53L, 10L, 3L, 11L, 50L, 10L, 30L, 10L, 25L,
7L, 5L, 3L, 15L, 7L, 8L, 36L, 66L, 16L, 4L, 9L, 34L, 7L, 64L,
5L, 29L, 7L, 5L, 2L, 14L, 7L, 8L), .Dim = c(4L, 4L, 2L), .Dimnames = list(
c("Black", "Brown", "Red", "Blond"), c("Brown", "Blue", "Hazel",
"Green"), c("Male", "Female")))
Now we need to be clear on what you are computing. Your desired percentage was based on the total number of brown-eyed, black haired individuals, but this is misleading since the sample sizes of males and females is not the same:
margin.table(HEC1, 3)
Male Female
279 313
If instead we compute the tables separately and subtract them:
Male.pct <- prop.table(HEC1[, , 1]) * 100
round(Male.pct, 2)
# Brown Blue Hazel Green
# Black 11.47 3.94 3.58 1.08
# Brown 19.00 17.92 8.96 5.38
# Red 3.58 3.58 2.51 2.51
# Blond 1.08 10.75 1.79 2.87
Female.pct <- prop.table(HEC1[, , 2]) * 100
round(Female.pct, 2)
# Brown Blue Hazel Green
# Black 11.50 2.88 1.60 0.64
# Brown 21.09 10.86 9.27 4.47
# Red 5.11 2.24 2.24 2.24
# Blond 1.28 20.45 1.60 2.56
round(Male.pct - Female.pct, 2)
# Brown Blue Hazel Green
# Black -0.03 1.07 1.99 0.44
# Brown -2.09 7.06 -0.30 0.90
# Red -1.53 1.35 0.27 0.27
# Blond -0.20 -9.69 0.19 0.31
Your original apply
function computed the proportions a third way, over the entire array combining both tables. You should have used:
apply(prop.table(HEC1, 3), 1:2, diff) * 100
# Brown Blue Hazel Green
# Black 0.03206339 -1.067253 -1.9867853 -0.4362912
# Brown 2.08984621 -7.058527 0.3046022 -0.9035006
# Red 1.52759170 -1.347808 -0.2725388 -0.2725388
# Blond 0.20268645 9.694596 -0.1946706 -0.3114730
The diff
function subtracts the second value from the first (Female - Male) so the negative signs are reversed compared to Male - Female above.
R: How do I sort a prop.table by one of the two percent dimensions?
If we wanted to order
based on the 2nd row, subset the 2nd row (A1[2,]
by specifying the row index on i
), order
the vector and use that as column index to reorder in j
A2 <- A1[,order(A1[2,])]
A2
# 8 1 5 13 9 6 11 2 7 4 12 10 3
# -1 1.00 0.90 0.80 0.75 0.65 0.60 0.50 0.30 0.30 0.25 0.15 0.10 0.00
# 1 0.00 0.10 0.20 0.25 0.35 0.40 0.50 0.70 0.70 0.75 0.85 0.90 1.00
colSums(A2)
# 8 1 5 13 9 6 11 2 7 4 12 10 3
# 1 1 1 1 1 1 1 1 1 1 1 1 1
Related Topics
Removing/Replacing Brackets from R String Using Gsub
How to Get The R Shiny Downloadhandler Filename to Work
Importing an Excel File with Greek Characters into R in The Correct Encoding
Learning to Write Functions in R
Overlapped Density Plots in Ggplot2
Blockwise Sum of Matrix Elements
Download File from Internet via R Despite The Popup
How to Fix Degree Symbol Not Showing Correctly in R on Linux/Fedora 31
Recursive Function Using Dplyr
How to Simulate Bimodal Distribution
An Error in R: When I Try to Apply Outer Function: