Finding percentage in a sub-group using group_by and summarise
Try
library(dplyr)
data %>%
group_by(month) %>%
mutate(countT= sum(count)) %>%
group_by(type, add=TRUE) %>%
mutate(per=paste0(round(100*count/countT,2),'%'))
Or make it more simpler without creating additional columns
data %>%
group_by(month) %>%
mutate(per = 100 *count/sum(count)) %>%
ungroup
We could also use left_join
after summarising the sum(count)
by 'month'
Or an option using data.table
.
library(data.table)
setkey(setDT(data), month)[data[, list(count=sum(count)), month],
per:= paste0(round(100*count/i.count,2), '%')][]
Summarizing by subgroup percentage in R
Per your comment, if the subgroups are unique you can do
library(dplyr)
group_by(df, group) %>% mutate(percent = value/sum(value))
# group subgroup value percent
# 1 A a 1 0.1250000
# 2 A b 4 0.5000000
# 3 A c 2 0.2500000
# 4 A d 1 0.1250000
# 5 B a 1 0.1666667
# 6 B b 2 0.3333333
# 7 B c 3 0.5000000
Or to remove the value
column and add the percent
column at the same time, use transmute
group_by(df, group) %>% transmute(subgroup, percent = value/sum(value))
# group subgroup percent
# 1 A a 0.1250000
# 2 A b 0.5000000
# 3 A c 0.2500000
# 4 A d 0.1250000
# 5 B a 0.1666667
# 6 B b 0.3333333
# 7 B c 0.5000000
Calculate percentage within a subgroup in R
You first group by country to get the sum for each country. Then you group by country and motiv and use the sum for each country to calculate your frequency.
am2 %>%
group_by(country) %>%
mutate(sum_country = sum(number)) %>%
group_by(country, motif) %>%
mutate(freq = number/sum_country,
freq_perc = freq*100 %>% round(2))
ggplot2
example:
df <- am2 %>%
group_by(country) %>%
mutate(sum_country = sum(number)) %>%
group_by(country, motif) %>%
mutate(freq = number/sum_country,
freq_perc = freq*100 %>% round(2))
library(ggplot2)
df %>%
ggplot( aes(x=country, y=freq_perc, fill=motif)) +
geom_bar(stat="identity", position="dodge")
Summarize Percentage by Group in R
there's an easy solution using library janitor
meant for cross-tabulation purposes
library(janitor)
data %>%
adorn_totals(where = c("row","col")) %>%
adorn_percentages(denominator = "row") %>%
adorn_pct_formatting(digits = 0) %>%
adorn_ns(position = "front")
Industry Male Female Total
Art/Entertainment 100 (17%) 500 (83%) 600 (100%)
Banking 600 (86%) 100 (14%) 700 (100%)
Healthcare 53 (45%) 65 (55%) 118 (100%)
Education 20 (3%) 766 (97%) 786 (100%)
Military 47 (33%) 96 (67%) 143 (100%)
Medicine 500 (56%) 400 (44%) 900 (100%)
Law 500 (50%) 500 (50%) 1000 (100%)
Computer 200 (58%) 144 (42%) 344 (100%)
Sales 420 (86%) 69 (14%) 489 (100%)
Total 2440 (48%) 2640 (52%) 5080 (100%)
#OR
data %>%
adorn_percentages(denominator = "row") %>%
adorn_pct_formatting(digits = 2) %>%
adorn_ns(position = "front")
Industry Male Female
Art/Entertainment 100 (16.67%) 500 (83.33%)
Banking 600 (85.71%) 100 (14.29%)
Healthcare 53 (44.92%) 65 (55.08%)
Education 20 (2.54%) 766 (97.46%)
Military 47 (32.87%) 96 (67.13%)
Medicine 500 (55.56%) 400 (44.44%)
Law 500 (50.00%) 500 (50.00%)
Computer 200 (58.14%) 144 (41.86%)
Sales 420 (85.89%) 69 (14.11%)
data used
> data
Industry Male Female
1 Art/Entertainment 100 500
2 Banking 600 100
3 Healthcare 53 65
4 Education 20 766
5 Military 47 96
6 Medicine 500 400
7 Law 500 500
8 Computer 200 144
9 Sales 420 69
Summarizing a sub-group in R and calculating percentages
Does this work:
library(dplyr)
df %>% group_by(tag_id) %>%
mutate(pct_red = 100*number[color == 'red']/`total (red+blue)`)
# A tibble: 6 x 5
# Groups: tag_id [3]
tag_id color number `total (red+blue)` pct_red
<dbl> <chr> <dbl> <dbl> <dbl>
1 1 red 2 5 40
2 1 blue 3 5 40
3 2 red 5 7 71.4
4 2 blue 2 7 71.4
5 3 red 8 15 53.3
6 3 blue 7 15 53.3
Better way to summarize percentages by subgroup using dplyr?
You can simplify it as the following by switching the order of group_by
(i.e. group by manager
first then manger + title
instead of the other way);
mydata %>%
group_by(manager) %>%
mutate(mgr_count = n()) %>%
group_by(title, mgr_count, add=TRUE) %>%
summarise(
title_count = n(),
title_pctg = round(title_count / first(mgr_count) * 100, 1)
)
# A tibble: 7 x 5
# Groups: manager, title [?]
# manager title mgr_count title_count title_pctg
# <fctr> <fctr> <int> <int> <dbl>
#1 Jack Junior 7 1 14.3
#2 Jack Senior 7 6 85.7
#3 Mike Junior 4 4 100.0
#4 Sue Junior 4 2 50.0
#5 Sue Mid 4 2 50.0
#6 Tom Entry 3 2 66.7
#7 Tom Mid 3 1 33.3
group by in dplyr and calculating percentages
Adding an answer based on all comments above provided by @nicola, @akrun and myself,
library(dplyr)
#nicola
df %>%
filter(!is.na(Container_Pick_Day)) %>%
group_by(Service,Container_Pick_Day) %>%
summarise(Percentage=n()) %>%
group_by(Service) %>%
mutate(Percentage=Percentage/sum(Percentage)*100)
#akrun
df %>%
filter(complete.cases(Container_Pick_Day)) %>%
count(Service, Container_Pick_Day) %>%
group_by(Service) %>%
transmute(Container_Pick_Day, Percentage=n/sum(n)*100)
#Sotos
df %>%
na.omit() %>%
group_by_all() %>%
summarise(ptg = n()) %>%
group_by(Service) %>%
mutate(ptg = prop.table(ptg)*100)
All resulting to,
Service Container_Pick_Day Percentage
<fctr> <int> <dbl>
1 ABC 0 33.33333
2 ABC 1 50.00000
3 ABC 2 16.66667
4 DEF 0 16.66667
5 DEF 1 66.66667
6 DEF 2 16.66667
Relative frequencies / proportions with dplyr
Try this:
mtcars %>%
group_by(am, gear) %>%
summarise(n = n()) %>%
mutate(freq = n / sum(n))
# am gear n freq
# 1 0 3 15 0.7894737
# 2 0 4 4 0.2105263
# 3 1 4 8 0.6153846
# 4 1 5 5 0.3846154
From the dplyr vignette:
When you group by multiple variables, each summary peels off one level of the grouping. That makes it easy to progressively roll-up a dataset.
Thus, after the summarise
, the last grouping variable specified in group_by
, 'gear', is peeled off. In the mutate
step, the data is grouped by the remaining grouping variable(s), here 'am'. You may check grouping in each step with groups
.
The outcome of the peeling is of course dependent of the order of the grouping variables in the group_by
call. You may wish to do a subsequent group_by(am)
, to make your code more explicit.
For rounding and prettification, please refer to the nice answer by @Tyler Rinker.
Related Topics
How to Use Facets With a Dual Y-Axis Ggplot
Create Sequence of Repeated Values, in Sequence
How to Set Multiple Legends/Scales For the Same Aesthetic in Ggplot2
Check If the Number Is Integer
Simplest Way to Do Grouped Barplot
Subset Data to Contain Only Columns Whose Names Match a Condition
Sample from Vector of Varying Length (Including 1)
Concatenate Row-Wise Across Specific Columns of Dataframe
How to Tell What Is in One Vector and Not Another
Select Groups Which Have At Least One of a Certain Value
Reshape Multiple Values At Once
Remove an Entire Column from a Data.Frame in R
Fitting Several Regression Models With Dplyr