Calculate proportions within subsets of a data frame
You can use function ddply()
from library plyr
to calculate proportions for each combination and then add new column to data frame.
library(plyr)
DF<-ddply(DF,.(category1,category2),transform,prop=number/sum(number))
DF
category1 category2 animal number prop
1 A X dog 17 0.44736842
2 A X cat 3 0.07894737
3 A X mouse 18 0.47368421
4 A Y dog 2 0.14285714
Finding proportions based on data.frame subsets
Try:
transform(df, prop=count/ave(count, type, group, FUN=sum))
Calculate proportion of values within subgroup
We can do a group by sum
in summarise
. By default, the last grouping is dropped after the summarise
, so, use mutate
to divide the 'Sum' by the sum
of 'Sum' column
library(dplyr)
df1 %>%
group_by(cond, type) %>%
summarise(Sum = sum(value)) %>%
mutate(proportion = Sum/sum(Sum))
# A tibble: 5 x 4
# Groups: cond [2]
# cond type Sum proportion
# <chr> <chr> <int> <dbl>
#1 x A 6 0.857
#2 x B 1 0.143
#3 y C 7 0.412
#4 y D 5 0.294
#5 y E 5 0.294
Or using prop.table
from base R
prop.table(xtabs(value ~ cond + type, df1), 1)
data
df1 <- structure(list(cond = c("x", "x", "x", "y", "y", "y", "y"), type = c("A",
"A", "B", "C", "D", "D", "E"), value = c(2L, 4L, 1L, 7L, 2L,
3L, 5L)), class = "data.frame", row.names = c(NA, -7L))
Calculate percentage of a subset of data
group_by
twice
library(dplyr)
df_sum <- df %>%
group_by(rep) %>% # grouped by rep
mutate(sum_rep=sum(num)) %>% # sum of each rep
group_by(rep,class,DB) %>% # grouped by DB
summarise(desired=sum(num)/unique(sum_rep)) # sum(DB)/sum(rep)
Output
rep class DB desired
1 early1 CL 0 0.002282627
2 early1 CL 2 0.928243905
3 early1 CL 4 0.069473468
4 early2 CL 0 0.001972057
5 early2 CL 2 0.919988412
6 early2 CL 4 0.078039532
7 early3 CL 0 0.002552173
8 early3 CL 2 0.917096873
9 early3 CL 4 0.080350953
10 late1 CL 0 0.002709255
Calculate proportions of categories within groups
Using dplyr
you could do:
Reprex
- Code
library(dplyr)
df %>%
group_by(group) %>%
count(fruit) %>%
mutate(freq = n / sum(n) * 100) %>%
select(-n)
- Output
#> # A tibble: 6 x 3
#> # Groups: group [2]
#> group fruit freq
#> <dbl> <chr> <dbl>
#> 1 1 apples 34.3
#> 2 1 bananas 42.9
#> 3 1 oranges 22.9
#> 4 2 apples 27.7
#> 5 2 bananas 53.8
#> 6 2 oranges 18.5
Created on 2022-02-19 by the reprex package (v2.0.1)
Extract subsets of a data frame based on a proportion of the total number of rows
Use cut
to create a grouping variable, grp
, and then split df
on that. This gives a list, obj
, such that obj[[1]]
is the first group, etc.
grp <- cut(1:nrow(df), 10, labels = FALSE)
obj <- split(df, grp)
I don't recommend creating 10 separate variables out of that but to do that anyways:
names(obj) <- paste0("obj", names(obj))
attach(obj)
would attach a namespace to the path containing them or the following would create such variables right in the workspace:
names(obj) <- paste0("obj", names(obj))
for(g in names(obj)) assign(g, obj[[g]])
REVISED Improved names.
Calculating the proportion per subgroup with data.table
Using data.table
:
df <- read.table(header = T, text = "row country year
1 NLD 2005
2 NLD 2005
3 BLG 2006
4 BLG 2005
5 GER 2005
6 NLD 2007
7 NLD 2005
8 NLD 2008")
setDT(df)[, sum := .N, by = country][, prop := .N, by = c("country", "year")][, prop := prop/sum][, sum := NULL]
row country year prop
1: 1 NLD 2005 0.6
2: 2 NLD 2005 0.6
3: 3 BLG 2006 0.5
4: 4 BLG 2005 0.5
5: 5 GER 2005 1.0
6: 6 NLD 2007 0.2
7: 7 NLD 2005 0.6
8: 8 NLD 2008 0.2
Related Topics
How to Get Covariance Matrix for Random Effects (Blups/Conditional Modes) from Lme4
Na Matches Na, But Is Not Equal to Na. Why
Numbered Code Chunks in Rmarkdown
How to Show a Loading Screen When the Output Is Being Calculated in a Background Process
Calculating Prediction Accuracy of a Tree Using Rpart's Predict Method
Fitting Logarithmic Curve in R
Finding the Index of First Changes in the Elements of a Vector
How to Insert Appendix After References in Rmd Using Rstudio
Unzip Password Protected Zip Files in R
Differencebetween Scale Transformation and Coordinate System Transformation
Compute Only Diagonals of Matrix Multiplication in R
R: Find Missing Columns, Add to Data Frame If Missing
Ggplot2: How to Reduce Space Between Narrow Width Bars, After Coord_Flip, and Panel Border
Independently Move 2 Legends Ggplot2 on a Map
How to Tell Which Packages I am Not Using in My R Script
How to Make Shinyapp to Use Environmental Variables When Deployed on the Web