How to Get Proportions and Counts of a Data Frame in R

How to get proportions and counts of a data frame in r

Try installing plyr and running

library(plyr)
df <- data.frame(x1=c(1, 1, 0, 0, 1, 0),
label=c("a", "a", "b", "a", "c", "c"))
ddply(df, .(label), summarize, prop = mean(x1), count = length(x1))
# label prop count
# 1 a 0.6666667 3
# 2 b 0.0000000 1
# 3 c 0.5000000 2

which under the hood applies a split/apply/combine method similar to this in base R:

do.call(rbind, lapply(split(df, df$x2),
with, list(prop = mean(x1),
count = length(x1))))

convert data frame of counts to proportions in R

Probably something along these lines:

df[, -1] <- lapply( df[ , -1], function(x) x/sum(x, na.rm=TRUE) )

If it were a matrix you could have just used prop.table(mat). In this case however you need to limit to working only on the numeric columns (by excluding the first one).

Furthermore I think you need to exclude the "total" row:

 my.data[-5, -1] <- lapply( my.data[ -5 , -1], function(x){ x/sum(x, na.rm=TRUE)} )
my.data[ -5 , ]
state y1970 y1980 y1990 y2000
1 Alaska 0.02325581 0.03076923 NA 0.02941176
2 Iowa 0.05813953 0.10256410 0.21428571 0.16806723
3 Nevada 0.58139535 0.51282051 0.71428571 0.42016807
4 Ohio 0.29069767 0.30769231 NA 0.33613445
6 Wyoming 0.04651163 0.04615385 0.07142857 0.04621849

-------------

Alternate approach:

> my.data[,-1] <-lapply( my.data[  , -1], function(x){ x/x[5] } )
> my.data
state y1970 y1980 y1990 y2000
1 Alaska 0.02325581 0.03076923 NA 0.02941176
2 Iowa 0.05813953 0.10256410 0.13953488 0.16806723
3 Nevada 0.58139535 0.51282051 0.46511628 0.42016807
4 Ohio 0.29069767 0.30769231 NA 0.33613445
5 total 1.00000000 1.00000000 1.00000000 1.00000000
6 Wyoming 0.04651163 0.04615385 0.04651163 0.04621849

This shows what prop.table will return with missing values when used on both margins and then on rows and columns separately for a very simple matrix:

> prop.table( matrix( c( 1,2,NA, 3),2) )
[,1] [,2]
[1,] NA NA
[2,] NA NA
> prop.table( matrix( c( 1,2,NA, 3),2), 1 )
[,1] [,2]
[1,] NA NA
[2,] 0.4 0.6
> prop.table( matrix( c( 1,2,NA, 3),2), 2 )
[,1] [,2]
[1,] 0.3333333 NA
[2,] 0.6666667 NA

Calculate proportions within groups in a data frame in R

We can group by 'treatment', 'rep', calculate the 'prop'ortion by dividing the 'cells_alive' with the value of 'cells_alive' that correspond to 'Time' as 0

library(dplyr)
x1 <- x %>%
group_by(treatment, rep) %>%
mutate(prop = cells_alive/cells_alive[Time == 0])

-output

x1
# A tibble: 16 x 5
# Groups: treatment, rep [4]
# treatment rep Time cells_alive prop
# <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 1 1 0 500 1
# 2 1 1 30 470 0.94
# 3 1 1 60 100 0.2
# 4 1 1 180 20 0.04
# 5 1 2 0 476 1
# 6 1 2 30 310 0.651
# 7 1 2 60 99 0.208
# 8 1 2 180 2 0.00420
# 9 2 1 0 430 1
#10 2 1 30 420 0.977
#11 2 1 60 300 0.698
#12 2 1 180 100 0.233
#13 2 2 0 489 1
#14 2 2 30 451 0.922
#15 2 2 60 289 0.591
#16 2 2 180 4 0.00818

Or with match

x %>%
group_by(treatment, rep) %>%
mutate(prop = cells_alive/cells_alive[match(0, Time)])

if the 'Time' is already ordered

x %>%
group_by(treatment, rep) %>%
mutate(prop = cells_alive/first(cells_alive))

How to calculate the proportion in r?

Since you want ratio of sum of Immunised column with Eligible you could do

sum(df$Immunised)/sum(df$Eligible)
#[1] 0.770869

calculating the proportion of count variable per group in data.table in R

If you are looking for the ratio, you can do :

library(data.table)
mydata[, prop := count/sum(count) * 100, by = .(startYear, groupSize)]

# groupSize gender startYear count prop
# 1: intermediate F 2014 7546 55.9958445
# 2: small F 2014 3500 31.3395415
# 3: intermediate M 2014 5930 44.0041555
# 4: small M 2014 7668 68.6604585
# 5: huge F 2014 18114 56.7125861
# 6: huge M 2014 13826 43.2874139
# 7: large F 2014 11943 54.2222828
# 8: large M 2014 10083 45.7777172
#....

R percentage of counts from a data.frame

If we are using dplyr/tidyr, the way to get the expected is

library(dplyr)
library(tidyr)
df %>%
count(group, hight) %>%
mutate(percent = n/sum(n)) %>%
select(-n) %>%
spread(hight, percent)
# group short tall
# <fctr> <dbl> <dbl>
#1 A 0.3333333 0.6666667
#2 B 0.6666667 0.3333333

Or as @JoeRoe mentioned in the comments, we could use pivot_wider in the newer versions of tidyr as a replacement to spread

 ...
pivot_wider(names_from = hight, values_from = percent)

data

df <- data.frame(group, hight)

Tidy way to convert numeric columns from counts to proportions

Rephrase to the following:

df %>%
mutate_if(is.numeric, ~ . / rowSums(select(df, where(is.numeric))))

Output:

  id         x         y
1 A 0.3333333 0.6666667
2 B 0.3333333 0.6666667
3 C 0.3333333 0.6666667
4 D 0.3333333 0.6666667

Edit: If you want an answer that doesn't use any additional packages besides dplyr and base, and that can be piped more easily, here's one other (hacky) solution:

df %>%
group_by(id) %>%
mutate(sum = as.character(rowSums(select(cur_data(), is.numeric)))) %>%
summarise_if(is.numeric, ~ . / as.numeric(sum))

The usual dplyr ways of referring to the current data within a function (e.g. cur_data) don't seem to play nicely with rowSums in my original phrasing, so I took a slightly different approach here. There is likely a better way of doing this though, so I'm open to suggestions.



Related Topics



Leave a reply



Submit