Calculate Proportions Within Subsets of a Data Frame

Calculate proportions within subsets of a data frame

You can use function ddply() from library plyr to calculate proportions for each combination and then add new column to data frame.

 library(plyr)     
DF<-ddply(DF,.(category1,category2),transform,prop=number/sum(number))
DF
category1 category2 animal number prop
1 A X dog 17 0.44736842
2 A X cat 3 0.07894737
3 A X mouse 18 0.47368421
4 A Y dog 2 0.14285714

Finding proportions based on data.frame subsets

Try:

transform(df, prop=count/ave(count, type, group, FUN=sum))

Calculate proportion of values within subgroup

We can do a group by sum in summarise. By default, the last grouping is dropped after the summarise, so, use mutate to divide the 'Sum' by the sum of 'Sum' column

library(dplyr)
df1 %>%
group_by(cond, type) %>%
summarise(Sum = sum(value)) %>%
mutate(proportion = Sum/sum(Sum))
# A tibble: 5 x 4
# Groups: cond [2]
# cond type Sum proportion
# <chr> <chr> <int> <dbl>
#1 x A 6 0.857
#2 x B 1 0.143
#3 y C 7 0.412
#4 y D 5 0.294
#5 y E 5 0.294

Or using prop.table from base R

prop.table(xtabs(value ~ cond + type, df1), 1)

data

df1 <- structure(list(cond = c("x", "x", "x", "y", "y", "y", "y"), type = c("A", 
"A", "B", "C", "D", "D", "E"), value = c(2L, 4L, 1L, 7L, 2L,
3L, 5L)), class = "data.frame", row.names = c(NA, -7L))

Calculate percentage of a subset of data

group_by twice

library(dplyr)
df_sum <- df %>%
group_by(rep) %>% # grouped by rep
mutate(sum_rep=sum(num)) %>% # sum of each rep
group_by(rep,class,DB) %>% # grouped by DB
summarise(desired=sum(num)/unique(sum_rep)) # sum(DB)/sum(rep)

Output

      rep class    DB     desired
1 early1 CL 0 0.002282627
2 early1 CL 2 0.928243905
3 early1 CL 4 0.069473468
4 early2 CL 0 0.001972057
5 early2 CL 2 0.919988412
6 early2 CL 4 0.078039532
7 early3 CL 0 0.002552173
8 early3 CL 2 0.917096873
9 early3 CL 4 0.080350953
10 late1 CL 0 0.002709255

Calculate proportions of categories within groups

Using dplyr you could do:

Reprex

  • Code
library(dplyr)

df %>%
group_by(group) %>%
count(fruit) %>%
mutate(freq = n / sum(n) * 100) %>%
select(-n)
  • Output
#> # A tibble: 6 x 3
#> # Groups: group [2]
#> group fruit freq
#> <dbl> <chr> <dbl>
#> 1 1 apples 34.3
#> 2 1 bananas 42.9
#> 3 1 oranges 22.9
#> 4 2 apples 27.7
#> 5 2 bananas 53.8
#> 6 2 oranges 18.5

Created on 2022-02-19 by the reprex package (v2.0.1)

Extract subsets of a data frame based on a proportion of the total number of rows

Use cut to create a grouping variable, grp, and then split df on that. This gives a list, obj, such that obj[[1]] is the first group, etc.

grp <- cut(1:nrow(df), 10, labels = FALSE)
obj <- split(df, grp)

I don't recommend creating 10 separate variables out of that but to do that anyways:

names(obj) <- paste0("obj", names(obj))
attach(obj)

would attach a namespace to the path containing them or the following would create such variables right in the workspace:

names(obj) <- paste0("obj", names(obj))
for(g in names(obj)) assign(g, obj[[g]])

REVISED Improved names.

Calculating the proportion per subgroup with data.table

Using data.table:

df <- read.table(header = T, text = "row  country year
1 NLD 2005
2 NLD 2005
3 BLG 2006
4 BLG 2005
5 GER 2005
6 NLD 2007
7 NLD 2005
8 NLD 2008")

setDT(df)[, sum := .N, by = country][, prop := .N, by = c("country", "year")][, prop := prop/sum][, sum := NULL]

row country year prop
1: 1 NLD 2005 0.6
2: 2 NLD 2005 0.6
3: 3 BLG 2006 0.5
4: 4 BLG 2005 0.5
5: 5 GER 2005 1.0
6: 6 NLD 2007 0.2
7: 7 NLD 2005 0.6
8: 8 NLD 2008 0.2


Related Topics



Leave a reply



Submit