Calculate Proportions Within Subsets of a Data Frame

Calculate proportions within subsets of a data frame

You can use function ddply() from library plyr to calculate proportions for each combination and then add new column to data frame.

 library(plyr)     
 DF<-ddply(DF,.(category1,category2),transform,prop=number/sum(number))
 DF
   category1 category2 animal number       prop
1          A         X    dog     17 0.44736842
2          A         X    cat      3 0.07894737
3          A         X  mouse     18 0.47368421
4          A         Y    dog      2 0.14285714

Finding proportions based on data.frame subsets

Try:

transform(df, prop=count/ave(count, type, group, FUN=sum))

Calculate proportion of values within subgroup

We can do a group by sum in summarise. By default, the last grouping is dropped after the summarise, so, use mutate to divide the 'Sum' by the sum of 'Sum' column

library(dplyr)
df1 %>%
    group_by(cond, type) %>%
    summarise(Sum = sum(value)) %>%
    mutate(proportion = Sum/sum(Sum))
# A tibble: 5 x 4
# Groups:   cond [2]
#  cond  type    Sum proportion
#  <chr> <chr> <int>      <dbl>
#1 x     A         6      0.857
#2 x     B         1      0.143
#3 y     C         7      0.412
#4 y     D         5      0.294
#5 y     E         5      0.294

Or using prop.table from base R

prop.table(xtabs(value ~ cond + type, df1), 1)

data

df1 <- structure(list(cond = c("x", "x", "x", "y", "y", "y", "y"), type = c("A", 
"A", "B", "C", "D", "D", "E"), value = c(2L, 4L, 1L, 7L, 2L, 
3L, 5L)), class = "data.frame", row.names = c(NA, -7L))

Calculate percentage of a subset of data

group_by twice

library(dplyr)
df_sum <- df %>%
             group_by(rep) %>%                     # grouped by rep
             mutate(sum_rep=sum(num)) %>%          # sum of each rep
             group_by(rep,class,DB) %>%            # grouped by DB
             summarise(desired=sum(num)/unique(sum_rep))   # sum(DB)/sum(rep)

Output

      rep class    DB     desired
 1 early1    CL     0 0.002282627
 2 early1    CL     2 0.928243905
 3 early1    CL     4 0.069473468
 4 early2    CL     0 0.001972057
 5 early2    CL     2 0.919988412
 6 early2    CL     4 0.078039532
 7 early3    CL     0 0.002552173
 8 early3    CL     2 0.917096873
 9 early3    CL     4 0.080350953
10  late1    CL     0 0.002709255

Calculate proportions of categories within groups

Using dplyr you could do:

Reprex

Code

library(dplyr)

df %>% 
  group_by(group) %>% 
  count(fruit) %>% 
  mutate(freq = n / sum(n) * 100) %>% 
  select(-n)

Output

#> # A tibble: 6 x 3
#> # Groups:   group [2]
#>   group fruit    freq
#>   <dbl> <chr>   <dbl>
#> 1     1 apples   34.3
#> 2     1 bananas  42.9
#> 3     1 oranges  22.9
#> 4     2 apples   27.7
#> 5     2 bananas  53.8
#> 6     2 oranges  18.5

^{Created on 2022-02-19 by the reprex package (v2.0.1)}

Extract subsets of a data frame based on a proportion of the total number of rows

Use cut to create a grouping variable, grp, and then split df on that. This gives a list, obj, such that obj[[1]] is the first group, etc.

grp <- cut(1:nrow(df), 10, labels = FALSE)
obj <- split(df, grp)

I don't recommend creating 10 separate variables out of that but to do that anyways:

names(obj) <- paste0("obj", names(obj))
attach(obj)

would attach a namespace to the path containing them or the following would create such variables right in the workspace:

names(obj) <- paste0("obj", names(obj))
for(g in names(obj)) assign(g, obj[[g]])

REVISED Improved names.

Calculating the proportion per subgroup with data.table

Using data.table:

df <- read.table(header = T, text = "row  country year
     1  NLD     2005
                 2  NLD     2005       
                 3  BLG     2006
                 4  BLG     2005
                 5  GER     2005
                 6  NLD     2007
                 7  NLD     2005
                 8  NLD     2008")

setDT(df)[, sum := .N, by = country][, prop := .N, by = c("country", "year")][, prop := prop/sum][, sum := NULL]

    row country year prop
1:   1     NLD 2005  0.6
2:   2     NLD 2005  0.6
3:   3     BLG 2006  0.5
4:   4     BLG 2005  0.5
5:   5     GER 2005  1.0
6:   6     NLD 2007  0.2
7:   7     NLD 2005  0.6
8:   8     NLD 2008  0.2

Calculate Proportions Within Subsets of a Data Frame