How to Get Proportions and Counts of a Data Frame in R

How to get proportions and counts of a data frame in r

Try installing plyr and running

library(plyr)
df <- data.frame(x1=c(1, 1, 0, 0, 1, 0),
                 label=c("a", "a", "b", "a", "c", "c"))
ddply(df, .(label), summarize, prop = mean(x1), count = length(x1))
#   label      prop count
# 1     a 0.6666667     3
# 2     b 0.0000000     1
# 3     c 0.5000000     2

which under the hood applies a split/apply/combine method similar to this in base R:

do.call(rbind, lapply(split(df, df$x2),
                            with, list(prop  = mean(x1),
                                       count = length(x1))))

convert data frame of counts to proportions in R

Probably something along these lines:

df[, -1] <- lapply( df[ , -1], function(x) x/sum(x, na.rm=TRUE) )

If it were a matrix you could have just used prop.table(mat). In this case however you need to limit to working only on the numeric columns (by excluding the first one).

Furthermore I think you need to exclude the "total" row:

 my.data[-5, -1] <- lapply( my.data[ -5 , -1], function(x){ x/sum(x, na.rm=TRUE)} )
 my.data[ -5 , ]
    state      y1970      y1980      y1990      y2000
1  Alaska 0.02325581 0.03076923         NA 0.02941176
2    Iowa 0.05813953 0.10256410 0.21428571 0.16806723
3  Nevada 0.58139535 0.51282051 0.71428571 0.42016807
4    Ohio 0.29069767 0.30769231         NA 0.33613445
6 Wyoming 0.04651163 0.04615385 0.07142857 0.04621849

-------------

Alternate approach:

> my.data[,-1] <-lapply( my.data[  , -1], function(x){ x/x[5] } )
> my.data
    state      y1970      y1980      y1990      y2000
1  Alaska 0.02325581 0.03076923         NA 0.02941176
2    Iowa 0.05813953 0.10256410 0.13953488 0.16806723
3  Nevada 0.58139535 0.51282051 0.46511628 0.42016807
4    Ohio 0.29069767 0.30769231         NA 0.33613445
5   total 1.00000000 1.00000000 1.00000000 1.00000000
6 Wyoming 0.04651163 0.04615385 0.04651163 0.04621849

This shows what prop.table will return with missing values when used on both margins and then on rows and columns separately for a very simple matrix:

> prop.table( matrix( c( 1,2,NA, 3),2) )
     [,1] [,2]
[1,]   NA   NA
[2,]   NA   NA
> prop.table( matrix( c( 1,2,NA, 3),2), 1 )
     [,1] [,2]
[1,]   NA   NA
[2,]  0.4  0.6
> prop.table( matrix( c( 1,2,NA, 3),2), 2 )
          [,1] [,2]
[1,] 0.3333333   NA
[2,] 0.6666667   NA

Calculate proportions within groups in a data frame in R

We can group by 'treatment', 'rep', calculate the 'prop'ortion by dividing the 'cells_alive' with the value of 'cells_alive' that correspond to 'Time' as 0

library(dplyr)
x1 <- x %>% 
   group_by(treatment, rep) %>% 
   mutate(prop = cells_alive/cells_alive[Time == 0])

-output

x1
# A tibble: 16 x 5
# Groups:   treatment, rep [4]
#   treatment   rep  Time cells_alive    prop
#       <dbl> <dbl> <dbl>       <dbl>   <dbl>
# 1         1     1     0         500 1      
# 2         1     1    30         470 0.94   
# 3         1     1    60         100 0.2    
# 4         1     1   180          20 0.04   
# 5         1     2     0         476 1      
# 6         1     2    30         310 0.651  
# 7         1     2    60          99 0.208  
# 8         1     2   180           2 0.00420
# 9         2     1     0         430 1      
#10         2     1    30         420 0.977  
#11         2     1    60         300 0.698  
#12         2     1   180         100 0.233  
#13         2     2     0         489 1      
#14         2     2    30         451 0.922  
#15         2     2    60         289 0.591  
#16         2     2   180           4 0.00818

Or with match

x %>%
     group_by(treatment, rep) %>%
     mutate(prop = cells_alive/cells_alive[match(0, Time)])

if the 'Time' is already ordered

x %>%
     group_by(treatment, rep) %>%
     mutate(prop = cells_alive/first(cells_alive))

How to calculate the proportion in r?

Since you want ratio of sum of Immunised column with Eligible you could do

sum(df$Immunised)/sum(df$Eligible)
#[1] 0.770869

calculating the proportion of count variable per group in data.table in R

If you are looking for the ratio, you can do :

library(data.table)
mydata[, prop := count/sum(count) * 100, by = .(startYear, groupSize)]

#       groupSize gender startYear count       prop
# 1: intermediate      F      2014  7546 55.9958445
# 2:        small      F      2014  3500 31.3395415
# 3: intermediate      M      2014  5930 44.0041555
# 4:        small      M      2014  7668 68.6604585
# 5:         huge      F      2014 18114 56.7125861
# 6:         huge      M      2014 13826 43.2874139
# 7:        large      F      2014 11943 54.2222828
# 8:        large      M      2014 10083 45.7777172
#....

R percentage of counts from a data.frame

If we are using dplyr/tidyr, the way to get the expected is

library(dplyr)
library(tidyr)
df %>%
    count(group, hight) %>% 
    mutate(percent = n/sum(n)) %>% 
    select(-n) %>% 
    spread(hight, percent)
#     group     short      tall
#    <fctr>     <dbl>     <dbl>
#1      A 0.3333333 0.6666667
#2      B 0.6666667 0.3333333

Or as @JoeRoe mentioned in the comments, we could use pivot_wider in the newer versions of tidyr as a replacement to spread

 ...
 pivot_wider(names_from = hight, values_from = percent)

data

df <- data.frame(group, hight)

Tidy way to convert numeric columns from counts to proportions

Rephrase to the following:

df %>%
  mutate_if(is.numeric, ~ . / rowSums(select(df, where(is.numeric))))

Output:

  id         x         y
1  A 0.3333333 0.6666667
2  B 0.3333333 0.6666667
3  C 0.3333333 0.6666667
4  D 0.3333333 0.6666667

Edit: If you want an answer that doesn't use any additional packages besides dplyr and base, and that can be piped more easily, here's one other (hacky) solution:

df %>%
  group_by(id) %>% 
  mutate(sum = as.character(rowSums(select(cur_data(), is.numeric)))) %>%
  summarise_if(is.numeric, ~ . / as.numeric(sum))

The usual dplyr ways of referring to the current data within a function (e.g. cur_data) don't seem to play nicely with rowSums in my original phrasing, so I took a slightly different approach here. There is likely a better way of doing this though, so I'm open to suggestions.

How to Get Proportions and Counts of a Data Frame in R