﻿ Finding Percentage in a Sub-Group Using Group_By and Summarise - ITCodar

# Finding Percentage in a Sub-Group Using Group_By and Summarise

## Finding percentage in a sub-group using group_by and summarise

Try

``library(dplyr)data %>%    group_by(month) %>%    mutate(countT= sum(count)) %>%    group_by(type, add=TRUE) %>%    mutate(per=paste0(round(100*count/countT,2),'%'))``

Or make it more simpler without creating additional columns

``data %>%    group_by(month) %>%    mutate(per =  100 *count/sum(count)) %>%     ungroup``

We could also use `left_join` after summarising the `sum(count)` by 'month'

Or an option using `data.table`.

`` library(data.table) setkey(setDT(data), month)[data[, list(count=sum(count)), month],                per:= paste0(round(100*count/i.count,2), '%')][]``

## Summarizing by subgroup percentage in R

Per your comment, if the subgroups are unique you can do

``library(dplyr)group_by(df, group) %>% mutate(percent = value/sum(value))#   group subgroup value   percent# 1     A        a     1 0.1250000# 2     A        b     4 0.5000000# 3     A        c     2 0.2500000# 4     A        d     1 0.1250000# 5     B        a     1 0.1666667# 6     B        b     2 0.3333333# 7     B        c     3 0.5000000``

Or to remove the `value` column and add the `percent` column at the same time, use `transmute`

``group_by(df, group) %>% transmute(subgroup, percent = value/sum(value))#   group subgroup   percent# 1     A        a 0.1250000# 2     A        b 0.5000000# 3     A        c 0.2500000# 4     A        d 0.1250000# 5     B        a 0.1666667# 6     B        b 0.3333333# 7     B        c 0.5000000``

## Calculate percentage within a subgroup in R

You first group by country to get the sum for each country. Then you group by country and motiv and use the sum for each country to calculate your frequency.

`` am2 %>%      group_by(country) %>%      mutate(sum_country = sum(number)) %>%       group_by(country, motif) %>%       mutate(freq = number/sum_country,             freq_perc = freq*100 %>% round(2))``

`ggplot2` example:

``df <- am2 %>%   group_by(country) %>%  mutate(sum_country = sum(number)) %>%   group_by(country, motif) %>%   mutate(freq = number/sum_country,         freq_perc = freq*100 %>% round(2))library(ggplot2)df %>%   ggplot( aes(x=country, y=freq_perc, fill=motif)) +  geom_bar(stat="identity", position="dodge") ``

## Relative frequencies / proportions with dplyr

Try this:

``mtcars %>%  group_by(am, gear) %>%  summarise(n = n()) %>%  mutate(freq = n / sum(n))#   am gear  n      freq# 1  0    3 15 0.7894737# 2  0    4  4 0.2105263# 3  1    4  8 0.6153846# 4  1    5  5 0.3846154``

From the dplyr vignette:

When you group by multiple variables, each summary peels off one level of the grouping. That makes it easy to progressively roll-up a dataset.

Thus, after the `summarise`, the last grouping variable specified in `group_by`, 'gear', is peeled off. In the `mutate` step, the data is grouped by the remaining grouping variable(s), here 'am'. You may check grouping in each step with `groups`.

The outcome of the peeling is of course dependent of the order of the grouping variables in the `group_by` call. You may wish to do a subsequent `group_by(am)`, to make your code more explicit.

For rounding and prettification, please refer to the nice answer by @Tyler Rinker.

## Finding percentage using group_by and summarise in R through dplyr

We can try

``demographics %>%  group_by(Academic_Level) %>%  summarise(Unique_Elements = n_distinct(userID)) %>%  mutate(perc = 100 * Unique_Elements/sum(Unique_Elements))``

## group by in dplyr and calculating percentages

``library(dplyr)#nicoladf %>%  filter(!is.na(Container_Pick_Day)) %>%  group_by(Service,Container_Pick_Day) %>%  summarise(Percentage=n()) %>%  group_by(Service) %>%  mutate(Percentage=Percentage/sum(Percentage)*100)#akrundf %>%  filter(complete.cases(Container_Pick_Day)) %>%  count(Service, Container_Pick_Day) %>%  group_by(Service) %>%  transmute(Container_Pick_Day, Percentage=n/sum(n)*100)#Sotosdf %>%  na.omit() %>%  group_by_all() %>%  summarise(ptg = n()) %>%  group_by(Service) %>%  mutate(ptg = prop.table(ptg)*100)``

All resulting to,

``Service Container_Pick_Day Percentage   <fctr>              <int>      <dbl>1     ABC                  0   33.333332     ABC                  1   50.000003     ABC                  2   16.666674     DEF                  0   16.666675     DEF                  1   66.666676     DEF                  2   16.66667``

## percentage count by group using dplyr

You can either pipe this to a `mutate( prop = count / sum(count) )` or directly within `summarise` with `nrow(.)`. Something like this:

``df %>%  group_by(colors) %>%  summarise(count = n() / nrow(.) )``

or

``df %>%  group_by(colors) %>%  summarise(count = n() ) %>%  mutate( prop = count / sum(count) )``

## Better way to summarize percentages by subgroup using dplyr?

You can simplify it as the following by switching the order of `group_by` (i.e. group by `manager` first then `manger + title` instead of the other way);

``mydata %>%     group_by(manager) %>%     mutate(mgr_count = n()) %>%     group_by(title, mgr_count, add=TRUE) %>%     summarise(        title_count = n(),         title_pctg = round(title_count / first(mgr_count) * 100, 1)    )# A tibble: 7 x 5# Groups:   manager, title [?]#  manager  title mgr_count title_count title_pctg#   <fctr> <fctr>     <int>       <int>      <dbl>#1    Jack Junior         7           1       14.3#2    Jack Senior         7           6       85.7#3    Mike Junior         4           4      100.0#4     Sue Junior         4           2       50.0#5     Sue    Mid         4           2       50.0#6     Tom  Entry         3           2       66.7#7     Tom    Mid         3           1       33.3``

## Transform absolute values into yearly percentages in R

If the Type is always the seq `A,B`:

#### Data:

``df <- data.frame(data.table::fread("Year Type Return1900 A 41900 B 71901 A 871901 B 31902 A 9.71902 B 2"))``

#### Code:

``library(tidyverse)df %>%   group_by(Year) %>%   summarise(Return = Return / sum(Return)) %>%   ungroup() %>%   mutate(Type = df\$Type) %>%   relocate(Year, Type, Return) %>%   mutate(Return = round(Return, 2))``

#### Output

``   Year Type  Return  <int> <chr>  <dbl>1  1900 A       0.362  1900 B       0.643  1901 A       0.974  1901 B       0.035  1902 A       0.836  1902 B       0.17``

#### A prettier output:

``df %>%   group_by(Year) %>%   summarise(Return = Return / sum(Return)) %>%   ungroup() %>%   mutate(Type = df\$Type) %>%   relocate(Year, Type, Return) %>%   mutate(Return = scales::percent(Return))   Year Type  Return  <int> <chr> <chr> 1  1900 A     36%   2  1900 B     64%   3  1901 A     97%   4  1901 B     3%    5  1902 A     83%   6  1902 B     17% ``