Finding Percentage in a Sub-Group Using Group_By and Summarise

Finding percentage in a sub-group using group_by and summarise

Try

library(dplyr)
data %>%
group_by(month) %>%
mutate(countT= sum(count)) %>%
group_by(type, add=TRUE) %>%
mutate(per=paste0(round(100*count/countT,2),'%'))

Or make it more simpler without creating additional columns

data %>%
group_by(month) %>%
mutate(per = 100 *count/sum(count)) %>%
ungroup

We could also use left_join after summarising the sum(count) by 'month'

Or an option using data.table.

 library(data.table)
setkey(setDT(data), month)[data[, list(count=sum(count)), month],
per:= paste0(round(100*count/i.count,2), '%')][]

Summarizing by subgroup percentage in R

Per your comment, if the subgroups are unique you can do

library(dplyr)
group_by(df, group) %>% mutate(percent = value/sum(value))
# group subgroup value percent
# 1 A a 1 0.1250000
# 2 A b 4 0.5000000
# 3 A c 2 0.2500000
# 4 A d 1 0.1250000
# 5 B a 1 0.1666667
# 6 B b 2 0.3333333
# 7 B c 3 0.5000000

Or to remove the value column and add the percent column at the same time, use transmute

group_by(df, group) %>% transmute(subgroup, percent = value/sum(value))
# group subgroup percent
# 1 A a 0.1250000
# 2 A b 0.5000000
# 3 A c 0.2500000
# 4 A d 0.1250000
# 5 B a 0.1666667
# 6 B b 0.3333333
# 7 B c 0.5000000

Calculate percentage within a subgroup in R

You first group by country to get the sum for each country. Then you group by country and motiv and use the sum for each country to calculate your frequency.

 am2 %>%
group_by(country) %>%
mutate(sum_country = sum(number)) %>%
group_by(country, motif) %>%
mutate(freq = number/sum_country,
freq_perc = freq*100 %>% round(2))

ggplot2 example:

df <- am2 %>% 
group_by(country) %>%
mutate(sum_country = sum(number)) %>%
group_by(country, motif) %>%
mutate(freq = number/sum_country,
freq_perc = freq*100 %>% round(2))

library(ggplot2)
df %>%
ggplot( aes(x=country, y=freq_perc, fill=motif)) +
geom_bar(stat="identity", position="dodge")

Relative frequencies / proportions with dplyr

Try this:

mtcars %>%
group_by(am, gear) %>%
summarise(n = n()) %>%
mutate(freq = n / sum(n))

# am gear n freq
# 1 0 3 15 0.7894737
# 2 0 4 4 0.2105263
# 3 1 4 8 0.6153846
# 4 1 5 5 0.3846154

From the dplyr vignette:

When you group by multiple variables, each summary peels off one level of the grouping. That makes it easy to progressively roll-up a dataset.

Thus, after the summarise, the last grouping variable specified in group_by, 'gear', is peeled off. In the mutate step, the data is grouped by the remaining grouping variable(s), here 'am'. You may check grouping in each step with groups.

The outcome of the peeling is of course dependent of the order of the grouping variables in the group_by call. You may wish to do a subsequent group_by(am), to make your code more explicit.

For rounding and prettification, please refer to the nice answer by @Tyler Rinker.

Finding percentage using group_by and summarise in R through dplyr

We can try

demographics %>%
group_by(Academic_Level) %>%
summarise(Unique_Elements = n_distinct(userID)) %>%
mutate(perc = 100 * Unique_Elements/sum(Unique_Elements))

group by in dplyr and calculating percentages

Adding an answer based on all comments above provided by @nicola, @akrun and myself,

library(dplyr)

#nicola
df %>%
filter(!is.na(Container_Pick_Day)) %>%
group_by(Service,Container_Pick_Day) %>%
summarise(Percentage=n()) %>%
group_by(Service) %>%
mutate(Percentage=Percentage/sum(Percentage)*100)

#akrun
df %>%
filter(complete.cases(Container_Pick_Day)) %>%
count(Service, Container_Pick_Day) %>%
group_by(Service) %>%
transmute(Container_Pick_Day, Percentage=n/sum(n)*100)

#Sotos
df %>%
na.omit() %>%
group_by_all() %>%
summarise(ptg = n()) %>%
group_by(Service) %>%
mutate(ptg = prop.table(ptg)*100)

All resulting to,

Service Container_Pick_Day Percentage
<fctr> <int> <dbl>
1 ABC 0 33.33333
2 ABC 1 50.00000
3 ABC 2 16.66667
4 DEF 0 16.66667
5 DEF 1 66.66667
6 DEF 2 16.66667

percentage count by group using dplyr

You can either pipe this to a mutate( prop = count / sum(count) ) or directly within summarise with nrow(.). Something like this:

df %>%
group_by(colors) %>%
summarise(count = n() / nrow(.) )

or

df %>%
group_by(colors) %>%
summarise(count = n() ) %>%
mutate( prop = count / sum(count) )

Better way to summarize percentages by subgroup using dplyr?

You can simplify it as the following by switching the order of group_by (i.e. group by manager first then manger + title instead of the other way);

mydata %>% 
group_by(manager) %>%
mutate(mgr_count = n()) %>%
group_by(title, mgr_count, add=TRUE) %>%
summarise(
title_count = n(),
title_pctg = round(title_count / first(mgr_count) * 100, 1)
)

# A tibble: 7 x 5
# Groups: manager, title [?]
# manager title mgr_count title_count title_pctg
# <fctr> <fctr> <int> <int> <dbl>
#1 Jack Junior 7 1 14.3
#2 Jack Senior 7 6 85.7
#3 Mike Junior 4 4 100.0
#4 Sue Junior 4 2 50.0
#5 Sue Mid 4 2 50.0
#6 Tom Entry 3 2 66.7
#7 Tom Mid 3 1 33.3

Transform absolute values into yearly percentages in R

If the Type is always the seq A,B:

Data:

df <- data.frame(data.table::fread("Year Type Return
1900 A 4
1900 B 7
1901 A 87
1901 B 3
1902 A 9.7
1902 B 2"))

Code:

library(tidyverse)
df %>%
group_by(Year) %>%
summarise(Return = Return / sum(Return)) %>%
ungroup() %>%
mutate(Type = df$Type) %>%
relocate(Year, Type, Return) %>%
mutate(Return = round(Return, 2))

Output

   Year Type  Return
<int> <chr> <dbl>
1 1900 A 0.36
2 1900 B 0.64
3 1901 A 0.97
4 1901 B 0.03
5 1902 A 0.83
6 1902 B 0.17

A prettier output:

df %>% 
group_by(Year) %>%
summarise(Return = Return / sum(Return)) %>%
ungroup() %>%
mutate(Type = df$Type) %>%
relocate(Year, Type, Return) %>%
mutate(Return = scales::percent(Return))

Year Type Return
<int> <chr> <chr>
1 1900 A 36%
2 1900 B 64%
3 1901 A 97%
4 1901 B 3%
5 1902 A 83%
6 1902 B 17%


Related Topics



Leave a reply



Submit