Finding percentage in a sub-group using group_by and summarise
Try
library(dplyr)
data %>%
group_by(month) %>%
mutate(countT= sum(count)) %>%
group_by(type, add=TRUE) %>%
mutate(per=paste0(round(100*count/countT,2),'%'))
Or make it more simpler without creating additional columns
data %>%
group_by(month) %>%
mutate(per = 100 *count/sum(count)) %>%
ungroup
We could also use left_join
after summarising the sum(count)
by 'month'
Or an option using data.table
.
library(data.table)
setkey(setDT(data), month)[data[, list(count=sum(count)), month],
per:= paste0(round(100*count/i.count,2), '%')][]
Summarizing by subgroup percentage in R
Per your comment, if the subgroups are unique you can do
library(dplyr)
group_by(df, group) %>% mutate(percent = value/sum(value))
# group subgroup value percent
# 1 A a 1 0.1250000
# 2 A b 4 0.5000000
# 3 A c 2 0.2500000
# 4 A d 1 0.1250000
# 5 B a 1 0.1666667
# 6 B b 2 0.3333333
# 7 B c 3 0.5000000
Or to remove the value
column and add the percent
column at the same time, use transmute
group_by(df, group) %>% transmute(subgroup, percent = value/sum(value))
# group subgroup percent
# 1 A a 0.1250000
# 2 A b 0.5000000
# 3 A c 0.2500000
# 4 A d 0.1250000
# 5 B a 0.1666667
# 6 B b 0.3333333
# 7 B c 0.5000000
Calculate percentage within a subgroup in R
You first group by country to get the sum for each country. Then you group by country and motiv and use the sum for each country to calculate your frequency.
am2 %>%
group_by(country) %>%
mutate(sum_country = sum(number)) %>%
group_by(country, motif) %>%
mutate(freq = number/sum_country,
freq_perc = freq*100 %>% round(2))
ggplot2
example:
df <- am2 %>%
group_by(country) %>%
mutate(sum_country = sum(number)) %>%
group_by(country, motif) %>%
mutate(freq = number/sum_country,
freq_perc = freq*100 %>% round(2))
library(ggplot2)
df %>%
ggplot( aes(x=country, y=freq_perc, fill=motif)) +
geom_bar(stat="identity", position="dodge")
Relative frequencies / proportions with dplyr
Try this:
mtcars %>%
group_by(am, gear) %>%
summarise(n = n()) %>%
mutate(freq = n / sum(n))
# am gear n freq
# 1 0 3 15 0.7894737
# 2 0 4 4 0.2105263
# 3 1 4 8 0.6153846
# 4 1 5 5 0.3846154
From the dplyr vignette:
When you group by multiple variables, each summary peels off one level of the grouping. That makes it easy to progressively roll-up a dataset.
Thus, after the summarise
, the last grouping variable specified in group_by
, 'gear', is peeled off. In the mutate
step, the data is grouped by the remaining grouping variable(s), here 'am'. You may check grouping in each step with groups
.
The outcome of the peeling is of course dependent of the order of the grouping variables in the group_by
call. You may wish to do a subsequent group_by(am)
, to make your code more explicit.
For rounding and prettification, please refer to the nice answer by @Tyler Rinker.
Finding percentage using group_by and summarise in R through dplyr
We can try
demographics %>%
group_by(Academic_Level) %>%
summarise(Unique_Elements = n_distinct(userID)) %>%
mutate(perc = 100 * Unique_Elements/sum(Unique_Elements))
group by in dplyr and calculating percentages
Adding an answer based on all comments above provided by @nicola, @akrun and myself,
library(dplyr)
#nicola
df %>%
filter(!is.na(Container_Pick_Day)) %>%
group_by(Service,Container_Pick_Day) %>%
summarise(Percentage=n()) %>%
group_by(Service) %>%
mutate(Percentage=Percentage/sum(Percentage)*100)
#akrun
df %>%
filter(complete.cases(Container_Pick_Day)) %>%
count(Service, Container_Pick_Day) %>%
group_by(Service) %>%
transmute(Container_Pick_Day, Percentage=n/sum(n)*100)
#Sotos
df %>%
na.omit() %>%
group_by_all() %>%
summarise(ptg = n()) %>%
group_by(Service) %>%
mutate(ptg = prop.table(ptg)*100)
All resulting to,
Service Container_Pick_Day Percentage
<fctr> <int> <dbl>
1 ABC 0 33.33333
2 ABC 1 50.00000
3 ABC 2 16.66667
4 DEF 0 16.66667
5 DEF 1 66.66667
6 DEF 2 16.66667
percentage count by group using dplyr
You can either pipe this to a mutate( prop = count / sum(count) )
or directly within summarise
with nrow(.)
. Something like this:
df %>%
group_by(colors) %>%
summarise(count = n() / nrow(.) )
or
df %>%
group_by(colors) %>%
summarise(count = n() ) %>%
mutate( prop = count / sum(count) )
Better way to summarize percentages by subgroup using dplyr?
You can simplify it as the following by switching the order of group_by
(i.e. group by manager
first then manger + title
instead of the other way);
mydata %>%
group_by(manager) %>%
mutate(mgr_count = n()) %>%
group_by(title, mgr_count, add=TRUE) %>%
summarise(
title_count = n(),
title_pctg = round(title_count / first(mgr_count) * 100, 1)
)
# A tibble: 7 x 5
# Groups: manager, title [?]
# manager title mgr_count title_count title_pctg
# <fctr> <fctr> <int> <int> <dbl>
#1 Jack Junior 7 1 14.3
#2 Jack Senior 7 6 85.7
#3 Mike Junior 4 4 100.0
#4 Sue Junior 4 2 50.0
#5 Sue Mid 4 2 50.0
#6 Tom Entry 3 2 66.7
#7 Tom Mid 3 1 33.3
Transform absolute values into yearly percentages in R
If the Type is always the seq A,B
:
Data:
df <- data.frame(data.table::fread("Year Type Return
1900 A 4
1900 B 7
1901 A 87
1901 B 3
1902 A 9.7
1902 B 2"))
Code:
library(tidyverse)
df %>%
group_by(Year) %>%
summarise(Return = Return / sum(Return)) %>%
ungroup() %>%
mutate(Type = df$Type) %>%
relocate(Year, Type, Return) %>%
mutate(Return = round(Return, 2))
Output
Year Type Return
<int> <chr> <dbl>
1 1900 A 0.36
2 1900 B 0.64
3 1901 A 0.97
4 1901 B 0.03
5 1902 A 0.83
6 1902 B 0.17
A prettier output:
df %>%
group_by(Year) %>%
summarise(Return = Return / sum(Return)) %>%
ungroup() %>%
mutate(Type = df$Type) %>%
relocate(Year, Type, Return) %>%
mutate(Return = scales::percent(Return))
Year Type Return
<int> <chr> <chr>
1 1900 A 36%
2 1900 B 64%
3 1901 A 97%
4 1901 B 3%
5 1902 A 83%
6 1902 B 17%
Related Topics
Import Text File as Single Character String
Create New Variables With Mutate_At While Keeping the Original Ones
Repeat Rows of a Data.Frame N Times
Memory Allocation "Error: Cannot Allocate Vector of Size 75.1 Mb"
Concatenate Row-Wise Across Specific Columns of Dataframe
How to Set Up Conda-Installed R For Use With Rstudio
How to Listen For More Than One Event Expression Within a Shiny Eventreactive Handler
R Shiny - Add Tabpanel to Tabsetpanel Dynamically (With the Use of Renderui)
Adding a Regression Line on a Ggplot
How to Add Code Folding to Output Chunks in Rmarkdown HTML Documents
Locate the ".Rprofile" File Generating Default Options
Ggplot Bar Plot With Facet-Dependent Order of Categories
Changing Column Names in a List of Data Frames in R
Convert Hour:Minute:Second (Hh:Mm:Ss) String to Proper Time Class