Getting Both Column Counts and Proportions in the Same Table in R

Getting both column counts and proportions in the same table in R

Here is one approach, you still need a second step, but it comes before the tabular command so the result is still a tabular object.

n <- 100 
x <- sample(letters[1:3], n, T) 
y <- sample(letters[1:3], n, T) 
d <- data.frame(x=x, y=y) 
d$z <- 1/ave( rep(1,n), d$x, FUN=sum )

(t1 <- tabular(x~y*Heading()*z*((n=length) + (p=sum)), d))

Two by two table with count and percentage in R

library(dplyr)
df %>% group_by(Gender,OnAntibiotic) %>% mutate(n=n()) %>% 
        group_by(OnAntibiotic) %>% distinct(OnAntibiotic,Gender,n)%>%
        mutate(Per=n/sum(n), np=paste0(n," (",round(Per*100,2)," %)")) %>%
        select(-n,-Per) %>% spread(OnAntibiotic,np)

# A tibble: 2 x 3
  Gender No       Yes        
  <fct>  <chr>    <chr>      
1 Female 3 (60 %) 8 (57.14 %)
2 Male   2 (40 %) 6 (42.86 %)

calculating the proportion of count variable per group in data.table in R

If you are looking for the ratio, you can do :

library(data.table)
mydata[, prop := count/sum(count) * 100, by = .(startYear, groupSize)]

#       groupSize gender startYear count       prop
# 1: intermediate      F      2014  7546 55.9958445
# 2:        small      F      2014  3500 31.3395415
# 3: intermediate      M      2014  5930 44.0041555
# 4:        small      M      2014  7668 68.6604585
# 5:         huge      F      2014 18114 56.7125861
# 6:         huge      M      2014 13826 43.2874139
# 7:        large      F      2014 11943 54.2222828
# 8:        large      M      2014 10083 45.7777172
#....

Tidy way to convert numeric columns from counts to proportions

Rephrase to the following:

df %>%
  mutate_if(is.numeric, ~ . / rowSums(select(df, where(is.numeric))))

Output:

  id         x         y
1  A 0.3333333 0.6666667
2  B 0.3333333 0.6666667
3  C 0.3333333 0.6666667
4  D 0.3333333 0.6666667

Edit: If you want an answer that doesn't use any additional packages besides dplyr and base, and that can be piped more easily, here's one other (hacky) solution:

df %>%
  group_by(id) %>% 
  mutate(sum = as.character(rowSums(select(cur_data(), is.numeric)))) %>%
  summarise_if(is.numeric, ~ . / as.numeric(sum))

The usual dplyr ways of referring to the current data within a function (e.g. cur_data) don't seem to play nicely with rowSums in my original phrasing, so I took a slightly different approach here. There is likely a better way of doing this though, so I'm open to suggestions.

convert data frame of counts to proportions in R

Probably something along these lines:

df[, -1] <- lapply( df[ , -1], function(x) x/sum(x, na.rm=TRUE) )

If it were a matrix you could have just used prop.table(mat). In this case however you need to limit to working only on the numeric columns (by excluding the first one).

Furthermore I think you need to exclude the "total" row:

 my.data[-5, -1] <- lapply( my.data[ -5 , -1], function(x){ x/sum(x, na.rm=TRUE)} )
 my.data[ -5 , ]
    state      y1970      y1980      y1990      y2000
1  Alaska 0.02325581 0.03076923         NA 0.02941176
2    Iowa 0.05813953 0.10256410 0.21428571 0.16806723
3  Nevada 0.58139535 0.51282051 0.71428571 0.42016807
4    Ohio 0.29069767 0.30769231         NA 0.33613445
6 Wyoming 0.04651163 0.04615385 0.07142857 0.04621849

-------------

Alternate approach:

> my.data[,-1] <-lapply( my.data[  , -1], function(x){ x/x[5] } )
> my.data
    state      y1970      y1980      y1990      y2000
1  Alaska 0.02325581 0.03076923         NA 0.02941176
2    Iowa 0.05813953 0.10256410 0.13953488 0.16806723
3  Nevada 0.58139535 0.51282051 0.46511628 0.42016807
4    Ohio 0.29069767 0.30769231         NA 0.33613445
5   total 1.00000000 1.00000000 1.00000000 1.00000000
6 Wyoming 0.04651163 0.04615385 0.04651163 0.04621849

This shows what prop.table will return with missing values when used on both margins and then on rows and columns separately for a very simple matrix:

> prop.table( matrix( c( 1,2,NA, 3),2) )
     [,1] [,2]
[1,]   NA   NA
[2,]   NA   NA
> prop.table( matrix( c( 1,2,NA, 3),2), 1 )
     [,1] [,2]
[1,]   NA   NA
[2,]  0.4  0.6
> prop.table( matrix( c( 1,2,NA, 3),2), 2 )
          [,1] [,2]
[1,] 0.3333333   NA
[2,] 0.6666667   NA

How to Calculate Percentage Based On Other Row

This is the beginning of a solution:

library(dplyr)

Year <- rep(2000, 6)
State <- c(rep("VA", 4), rep("MA", 2))
Age <- c("<44", "44+", "44+", "<44", "<44", "44+")
Pop <- c(150, 350, 500, 200, 100, 100)

df <- data.frame(State = State, Age = Age, Pop = Pop, Year= Year)

df %>% filter(Age != "Total") %>% group_by(Year, State)  %>% summarize(Pop44 = sum(Pop[Age=="<44"]) / sum(Pop))

You don't have to filter the "Total" category but it's usually not a good idea to have a "total" category (better have a column for that)

Calculating count and proportion of a certain value for a number of variables subsetted by other variables

You don't have to convert columns to factors. In fact, data.table recommends avoiding factors wherever possible, as it'll also improve speed. However, I'll illustrate how you can convert to factor much more easily for the future.

sd_cols = c("Feature1", "Feature2", "Feature3")
DT[, c(sd_cols) := lapply(.SD, as.factor), .SDcols=sd_cols]

Okay, now on to the solution. Of course we'll need to use CJ here because you need to get absent combinations as well. So, we've to generate that first.

uvals = c("no", "yes")
setkey(DT, Feature1, Feature2, Feature3)
DTn = DT[CJ(uvals, uvals, uvals), allow.cartesian=TRUE]

The allow.cartesian=TRUE is necessary because the join will result in more rows than max(nrow(x), nrow(i)) in a join x[i]. Read this post for more on allow.cartesian.

Now that we've all the combinations, we can group/aggregate them to obtain the results in the fashion you require.

ans = DTn[, { tmp1 = sum(Var1 == "yes", na.rm=TRUE);
             tmp2 = sum(Var2 == "yes", na.rm=TRUE);
           list(Var1.count = tmp1, 
                Var1.prop  = tmp1/.N, 
                Var2.count = tmp2,
                Var2.prop  = tmp2/.N * 100)
           }, by=key(DT)]

#    Feature1 Feature2 Feature3 Var1.count Var1.prop Var2.count Var2.prop
# 1:       no       no       no          0 0.0000000          1         1
# 2:       no       no      yes          0 0.0000000          0         0
# 3:       no      yes       no          0 0.0000000          0         0
# 4:       no      yes      yes          1 1.0000000          1         1
# 5:      yes       no       no          0 0.0000000          0         0
# 6:      yes       no      yes          0 0.0000000          0         0
# 7:      yes      yes       no          0 0.0000000          0         0
# 8:      yes      yes      yes          2 0.6666667          3         1

I think you can play around to get the values as NA instead of 0, if that's really that important?

Following OP's question under comment + edit, after getting DTn:

vars = c("Var1", "Var2")
ans = DTn[, c(N=.N, lapply(.SD, function(x) sum(x=="yes", na.rm=TRUE))), 
               by=key(DTn), .SDcols=vars]
N = ans$N
ans[, N := NULL]
ans[, c(paste(vars, "prop", sep=".")) := .SD/N, .SDcols=vars]
setnames(ans, vars, paste(vars, "count", sep="."))

ans
#    Feature1 Feature2 Feature3 Var1.count Var2.count Var1.prop Var2.prop
# 1:       no       no       no          0          1 0.0000000         1
# 2:       no       no      yes          0          0 0.0000000         0
# 3:       no      yes       no          0          0 0.0000000         0
# 4:       no      yes      yes          1          1 1.0000000         1
# 5:      yes       no       no          0          0 0.0000000         0
# 6:      yes       no      yes          0          0 0.0000000         0
# 7:      yes      yes       no          0          0 0.0000000         0
# 8:      yes      yes      yes          2          3 0.6666667         1

How about this?

Get the row (or column)-wise tabularized counts (as in table()) of a matrix

We convert the matrix from wide to long using melt from library(reshape2) and then do the table

library(reshape2)
table(melt(m)[3:2])
#      Var2
#value 1 2 3
#   a 1 1 3
#   b 3 1 0
#   c 0 2 0
#   d 0 0 1

If we need the proportion, we can use prop.table and change the margin accordingly.

prop.table(table(melt(m)[3:2]),1)

Another convenient function is mtabulate from library(qdapTools)

library(qdapTools)
t(mtabulate(as.data.frame(m)))

Getting Both Column Counts and Proportions in the Same Table in R