How to Create a Prop.Table() for a Three Dimension Table

Calculations proportions of a three way table

Is this what you are chasing?

The data:

test <- structure(c(42L, 151L, 313L, 69L, 22L, 46L, 47L, 24L, 17L, 36L, 108L, 16L), .Dim = c(2L, 2L, 3L), .Dimnames = list(c("0", "1" ), c("female", "male"), c("adult", "child", "unknown")), class = "table") 

Get the column percentages (2) within each sub-table (3):

prop.table(test,c(2,3))

Results in:

, , adult

female male
0 0.2176166 0.8193717
1 0.7823834 0.1806283

, , child

female male
0 0.3235294 0.6619718
1 0.6764706 0.3380282

, , unknown

female male
0 0.3207547 0.8709677
1 0.6792453 0.1290323

How can I create a prop.table from a dataset with multiple variables?

With sapply and tapply you can do :

cols <- 4:10
t(sapply(df[cols], function(x) tapply(x, df$poverty_t, sum)/sum(x)))

# -1 0 1 2
#n_fem 0.3333333 0.3333333 0.2500000 0.08333333
#n_male 0.5000000 0.2000000 0.3000000 0.00000000
#n_Sec_Edu 0.0000000 0.3333333 0.6666667 0.00000000
#n_High_Edu 0.5000000 0.3750000 0.1250000 0.00000000
#n_emp 0.3333333 0.4444444 0.2222222 0.00000000
#n_noemp 0.3333333 0.0000000 0.3333333 0.33333333
#n_stud 0.4166667 0.1666667 0.3333333 0.08333333

Working with Three-Dimensional Frequency Tables in R

You are very close, but you need to sum over 2 margins. I'm re-arranging your example so the "vote" is at the end as in your original question:

> tab <- xtabs(~cyl+gear+I(mpg>20), mtcars)
> prop.table(tab, 1:2)
, , I(mpg > 20) = FALSE

gear
cyl 3 4 5
4 0.0 0.0 0.0
6 0.5 0.5 1.0
8 1.0 1.0

, , I(mpg > 20) = TRUE

gear
cyl 3 4 5
4 1.0 1.0 1.0
6 0.5 0.5 0.0
8 0.0 0.0

> prop.table(tab, 1:2)[ , , 2] # Proportion TRUE for each combo
gear
cyl 3 4 5
4 1.0 1.0 1
6 0.5 0.5 0
8 0.0 NaN 0

All 4 cylinder cars get over 20mpg and no 8 cylinder cars do. To get a data frame:

> as.data.frame.table(prop.table(tab, 1:2)[ , , 2])
cyl gear Freq
1 4 3 1.0
2 6 3 0.5
3 8 3 0.0
4 4 4 1.0
5 6 4 0.5
6 8 4 NaN
7 4 5 1.0
8 6 5 0.0
9 8 5 0.0

s3 is there a way to combine prop.table for character variables?

Maybe this is what you wanted. Values in each category sum up to 100%.

lis <- sapply( Student1995, function(x) t( sapply( x, table ) ) )

sapply( lis, function(x) colSums(prop.table(x)) )
$alcohol
Not Once.or.Twice.a.week Once.a.month
0.0 0.6 0.4
Once.a.week More.than.once.a.week
0.0 0.0

$drugs
Not Tried.once Occasional Regular
0.8 0.2 0.0 0.0

$smoke
Not Occasional Regular
0.6 0.2 0.2

$sport
Not.regular Regular
0.4 0.6

and the whole summary...

prop.table( table(as.vector( sapply( Student1995, unlist ))) )

Not Not regular Occasional
0.35 0.10 0.05
Once a month Once or Twice a week Regular
0.10 0.15 0.20
Tried once
0.05

Create proportion matrix for multivariate categorical data

Tried to put logic at appropriate points in code sequence.

props <- data.frame(Cat = sort(unique(c(data))) )  # Just the Cat column
#Now fill in the entries
# the entries will be obtained with table function
apply(data, 2, table) # run `table(.)` over the columns individually
[[1]]

0 1 2 # these are actually character valued names
3 4 3 # while these are the count values

[[2]]

2 3 6 8 10
2 4 1 2 1

[[3]]

2 3 4 5 6
2 3 1 3 1

Now iterate over that list to fill in values that match the Cat column:

 props2 <- cbind(props,  # using dfrm first argument returns dataframe object 
lapply( apply(data, 2, table) , # irregular results are a list
function(col) { # first make a named vector of zeros
x <- setNames(rep(0,length(props$Cat)), props$Cat)
# could have skipped that step by using `tabulate`
# then fill with values using names as indices
x[names(col)] <- col # values to matching names
x}) )
props2
#-------------
Cat V1 V2 V3
0 0 3 0 0
1 1 4 0 0
2 2 3 2 2
3 3 0 4 3
4 4 0 0 1
5 5 0 0 3
6 6 0 1 1
8 8 0 2 0
10 10 0 1 0
#---
# now just "proportionalize" those counts
props2[2:4] <- prop.table(data.matrix(props2[2:4]), margin=2)
props2
#-------------
Cat V1 V2 V3
0 0 0.3 0.0 0.0
1 1 0.4 0.0 0.0
2 2 0.3 0.2 0.2
3 3 0.0 0.4 0.3
4 4 0.0 0.0 0.1
5 5 0.0 0.0 0.3
6 6 0.0 0.1 0.1
8 8 0.0 0.2 0.0
10 10 0.0 0.1 0.0

How to Calculate the percent difference in a three dimensional array using R apply()

You should always provide reproducible data. Printouts can leave out important details regarding the structure. Use dput(HEC1) and copy the results into your question like this:

HEC1 <- structure(c(32L, 53L, 10L, 3L, 11L, 50L, 10L, 30L, 10L, 25L, 
7L, 5L, 3L, 15L, 7L, 8L, 36L, 66L, 16L, 4L, 9L, 34L, 7L, 64L,
5L, 29L, 7L, 5L, 2L, 14L, 7L, 8L), .Dim = c(4L, 4L, 2L), .Dimnames = list(
c("Black", "Brown", "Red", "Blond"), c("Brown", "Blue", "Hazel",
"Green"), c("Male", "Female")))

Now we need to be clear on what you are computing. Your desired percentage was based on the total number of brown-eyed, black haired individuals, but this is misleading since the sample sizes of males and females is not the same:

margin.table(HEC1, 3)
Male Female
279 313

If instead we compute the tables separately and subtract them:

Male.pct <- prop.table(HEC1[, , 1]) * 100
round(Male.pct, 2)
# Brown Blue Hazel Green
# Black 11.47 3.94 3.58 1.08
# Brown 19.00 17.92 8.96 5.38
# Red 3.58 3.58 2.51 2.51
# Blond 1.08 10.75 1.79 2.87

Female.pct <- prop.table(HEC1[, , 2]) * 100
round(Female.pct, 2)
# Brown Blue Hazel Green
# Black 11.50 2.88 1.60 0.64
# Brown 21.09 10.86 9.27 4.47
# Red 5.11 2.24 2.24 2.24
# Blond 1.28 20.45 1.60 2.56

round(Male.pct - Female.pct, 2)
# Brown Blue Hazel Green
# Black -0.03 1.07 1.99 0.44
# Brown -2.09 7.06 -0.30 0.90
# Red -1.53 1.35 0.27 0.27
# Blond -0.20 -9.69 0.19 0.31

Your original apply function computed the proportions a third way, over the entire array combining both tables. You should have used:

apply(prop.table(HEC1, 3), 1:2, diff) * 100
# Brown Blue Hazel Green
# Black 0.03206339 -1.067253 -1.9867853 -0.4362912
# Brown 2.08984621 -7.058527 0.3046022 -0.9035006
# Red 1.52759170 -1.347808 -0.2725388 -0.2725388
# Blond 0.20268645 9.694596 -0.1946706 -0.3114730

The diff function subtracts the second value from the first (Female - Male) so the negative signs are reversed compared to Male - Female above.

R: How do I sort a prop.table by one of the two percent dimensions?

If we wanted to order based on the 2nd row, subset the 2nd row (A1[2,] by specifying the row index on i), order the vector and use that as column index to reorder in j

A2 <- A1[,order(A1[2,])]
A2
# 8 1 5 13 9 6 11 2 7 4 12 10 3
# -1 1.00 0.90 0.80 0.75 0.65 0.60 0.50 0.30 0.30 0.25 0.15 0.10 0.00
# 1 0.00 0.10 0.20 0.25 0.35 0.40 0.50 0.70 0.70 0.75 0.85 0.90 1.00

colSums(A2)
# 8 1 5 13 9 6 11 2 7 4 12 10 3
# 1 1 1 1 1 1 1 1 1 1 1 1 1


Related Topics



Leave a reply



Submit