Summarizing by Subgroup Percentage in R

Finding percentage in a sub-group using group_by and summarise

Try

library(dplyr)
data %>%
group_by(month) %>%
mutate(countT= sum(count)) %>%
group_by(type, add=TRUE) %>%
mutate(per=paste0(round(100*count/countT,2),'%'))

Or make it more simpler without creating additional columns

data %>%
group_by(month) %>%
mutate(per = 100 *count/sum(count)) %>%
ungroup

We could also use left_join after summarising the sum(count) by 'month'

Or an option using data.table.

 library(data.table)
setkey(setDT(data), month)[data[, list(count=sum(count)), month],
per:= paste0(round(100*count/i.count,2), '%')][]

Summarizing by subgroup percentage in R

Per your comment, if the subgroups are unique you can do

library(dplyr)
group_by(df, group) %>% mutate(percent = value/sum(value))
# group subgroup value percent
# 1 A a 1 0.1250000
# 2 A b 4 0.5000000
# 3 A c 2 0.2500000
# 4 A d 1 0.1250000
# 5 B a 1 0.1666667
# 6 B b 2 0.3333333
# 7 B c 3 0.5000000

Or to remove the value column and add the percent column at the same time, use transmute

group_by(df, group) %>% transmute(subgroup, percent = value/sum(value))
# group subgroup percent
# 1 A a 0.1250000
# 2 A b 0.5000000
# 3 A c 0.2500000
# 4 A d 0.1250000
# 5 B a 0.1666667
# 6 B b 0.3333333
# 7 B c 0.5000000

Calculate percentage within a subgroup in R

You first group by country to get the sum for each country. Then you group by country and motiv and use the sum for each country to calculate your frequency.

 am2 %>%
group_by(country) %>%
mutate(sum_country = sum(number)) %>%
group_by(country, motif) %>%
mutate(freq = number/sum_country,
freq_perc = freq*100 %>% round(2))

ggplot2 example:

df <- am2 %>% 
group_by(country) %>%
mutate(sum_country = sum(number)) %>%
group_by(country, motif) %>%
mutate(freq = number/sum_country,
freq_perc = freq*100 %>% round(2))

library(ggplot2)
df %>%
ggplot( aes(x=country, y=freq_perc, fill=motif)) +
geom_bar(stat="identity", position="dodge")

Summarize Percentage by Group in R

there's an easy solution using library janitor meant for cross-tabulation purposes

library(janitor)

data %>%
adorn_totals(where = c("row","col")) %>%
adorn_percentages(denominator = "row") %>%
adorn_pct_formatting(digits = 0) %>%
adorn_ns(position = "front")

Industry Male Female Total
Art/Entertainment 100 (17%) 500 (83%) 600 (100%)
Banking 600 (86%) 100 (14%) 700 (100%)
Healthcare 53 (45%) 65 (55%) 118 (100%)
Education 20 (3%) 766 (97%) 786 (100%)
Military 47 (33%) 96 (67%) 143 (100%)
Medicine 500 (56%) 400 (44%) 900 (100%)
Law 500 (50%) 500 (50%) 1000 (100%)
Computer 200 (58%) 144 (42%) 344 (100%)
Sales 420 (86%) 69 (14%) 489 (100%)
Total 2440 (48%) 2640 (52%) 5080 (100%)

#OR

data %>%
adorn_percentages(denominator = "row") %>%
adorn_pct_formatting(digits = 2) %>%
adorn_ns(position = "front")

Industry Male Female
Art/Entertainment 100 (16.67%) 500 (83.33%)
Banking 600 (85.71%) 100 (14.29%)
Healthcare 53 (44.92%) 65 (55.08%)
Education 20 (2.54%) 766 (97.46%)
Military 47 (32.87%) 96 (67.13%)
Medicine 500 (55.56%) 400 (44.44%)
Law 500 (50.00%) 500 (50.00%)
Computer 200 (58.14%) 144 (41.86%)
Sales 420 (85.89%) 69 (14.11%)

data used

> data
Industry Male Female
1 Art/Entertainment 100 500
2 Banking 600 100
3 Healthcare 53 65
4 Education 20 766
5 Military 47 96
6 Medicine 500 400
7 Law 500 500
8 Computer 200 144
9 Sales 420 69

Summarizing a sub-group in R and calculating percentages

Does this work:

library(dplyr)
df %>% group_by(tag_id) %>%
mutate(pct_red = 100*number[color == 'red']/`total (red+blue)`)
# A tibble: 6 x 5
# Groups: tag_id [3]
tag_id color number `total (red+blue)` pct_red
<dbl> <chr> <dbl> <dbl> <dbl>
1 1 red 2 5 40
2 1 blue 3 5 40
3 2 red 5 7 71.4
4 2 blue 2 7 71.4
5 3 red 8 15 53.3
6 3 blue 7 15 53.3

Better way to summarize percentages by subgroup using dplyr?

You can simplify it as the following by switching the order of group_by (i.e. group by manager first then manger + title instead of the other way);

mydata %>% 
group_by(manager) %>%
mutate(mgr_count = n()) %>%
group_by(title, mgr_count, add=TRUE) %>%
summarise(
title_count = n(),
title_pctg = round(title_count / first(mgr_count) * 100, 1)
)

# A tibble: 7 x 5
# Groups: manager, title [?]
# manager title mgr_count title_count title_pctg
# <fctr> <fctr> <int> <int> <dbl>
#1 Jack Junior 7 1 14.3
#2 Jack Senior 7 6 85.7
#3 Mike Junior 4 4 100.0
#4 Sue Junior 4 2 50.0
#5 Sue Mid 4 2 50.0
#6 Tom Entry 3 2 66.7
#7 Tom Mid 3 1 33.3

group by in dplyr and calculating percentages

Adding an answer based on all comments above provided by @nicola, @akrun and myself,

library(dplyr)

#nicola
df %>%
filter(!is.na(Container_Pick_Day)) %>%
group_by(Service,Container_Pick_Day) %>%
summarise(Percentage=n()) %>%
group_by(Service) %>%
mutate(Percentage=Percentage/sum(Percentage)*100)

#akrun
df %>%
filter(complete.cases(Container_Pick_Day)) %>%
count(Service, Container_Pick_Day) %>%
group_by(Service) %>%
transmute(Container_Pick_Day, Percentage=n/sum(n)*100)

#Sotos
df %>%
na.omit() %>%
group_by_all() %>%
summarise(ptg = n()) %>%
group_by(Service) %>%
mutate(ptg = prop.table(ptg)*100)

All resulting to,

Service Container_Pick_Day Percentage
<fctr> <int> <dbl>
1 ABC 0 33.33333
2 ABC 1 50.00000
3 ABC 2 16.66667
4 DEF 0 16.66667
5 DEF 1 66.66667
6 DEF 2 16.66667

Relative frequencies / proportions with dplyr

Try this:

mtcars %>%
group_by(am, gear) %>%
summarise(n = n()) %>%
mutate(freq = n / sum(n))

# am gear n freq
# 1 0 3 15 0.7894737
# 2 0 4 4 0.2105263
# 3 1 4 8 0.6153846
# 4 1 5 5 0.3846154

From the dplyr vignette:

When you group by multiple variables, each summary peels off one level of the grouping. That makes it easy to progressively roll-up a dataset.

Thus, after the summarise, the last grouping variable specified in group_by, 'gear', is peeled off. In the mutate step, the data is grouped by the remaining grouping variable(s), here 'am'. You may check grouping in each step with groups.

The outcome of the peeling is of course dependent of the order of the grouping variables in the group_by call. You may wish to do a subsequent group_by(am), to make your code more explicit.

For rounding and prettification, please refer to the nice answer by @Tyler Rinker.



Related Topics



Leave a reply



Submit