Proper Idiom for Adding Zero Count Rows in Tidyr/Dplyr

Proper idiom for adding zero count rows in tidyr/dplyr

Since dplyr 0.8 you can do it by setting the parameter .drop = FALSE in group_by:

X.tidy <- X.raw %>% group_by(x, y, .drop = FALSE) %>% summarise(count=sum(z))
X.tidy
# # A tibble: 4 x 3
# # Groups: x [2]
# x y count
# <fct> <fct> <int>
# 1 A i 1
# 2 A ii 5
# 3 B i 15
# 4 B ii 0

This will keep groups made of all the levels of factor columns so if you have character columns you might want to convert them (thanks to Pate for the note).

Filling in non-existing rows in R + dplyr

Up front: missing data to me is very different from 0. I'm assuming that you "know" with certainty that missing data should bring all other values down.

The name FiscalWeek suggests that it is an integer-like data, but your use of factor suggests ordinal or categorical. Because of that, you need to define authoritatively what the complete set of factors can be. And because your current factor does not contain all possible levels, I'll infer them (you need to adjust your all_groups_weeks accordingly:

all_groups_weeks <- tidyr::expand_grid(FiscalWeek = as.factor(45:48), Group = c("A", "B", "C"))
all_groups_weeks
# # A tibble: 12 x 2
# FiscalWeek Group
# <fct> <chr>
# 1 45 A
# 2 45 B
# 3 45 C
# 4 46 A
# 5 46 B
# 6 46 C
# 7 47 A
# 8 47 B
# 9 47 C
# 10 48 A
# 11 48 B
# 12 48 C

From here, join in the full data in order to "complete" it. Using tidyr::complete won't work because you don't have all possible values in the data (47 missing).

full_join(df, all_groups_weeks, by = c("FiscalWeek", "Group")) %>%
mutate(Amount = coalesce(Amount, 0))
# # A tibble: 12 x 3
# FiscalWeek Group Amount
# <fct> <chr> <dbl>
# 1 45 A 1
# 2 46 A 1
# 3 48 A 1
# 4 48 B 5
# 5 48 C 6
# 6 45 B 0
# 7 45 C 0
# 8 46 B 0
# 9 46 C 0
# 10 47 A 0
# 11 47 B 0
# 12 47 C 0

full_join(df, all_groups_weeks, by = c("FiscalWeek", "Group")) %>%
mutate(Amount = coalesce(Amount, 0)) %>%
group_by(Group) %>%
summarize(Avgs = mean(Amount, na.rm = TRUE))
# # A tibble: 3 x 2
# Group Avgs
# <chr> <dbl>
# 1 A 0.75
# 2 B 1.25
# 3 C 1.5

dplyr summarise: Equivalent of .drop=FALSE to keep groups with zero length in output

Since dplyr 0.8 group_by gained the .drop argument that does just what you asked for:

df = data.frame(a=rep(1:3,4), b=rep(1:2,6))
df$b = factor(df$b, levels=1:3)

df %>%
group_by(b, .drop=FALSE) %>%
summarise(count_a=length(a))

#> # A tibble: 3 x 2
#> b count_a
#> <fct> <int>
#> 1 1 6
#> 2 2 6
#> 3 3 0

One additional note to go with @Moody_Mudskipper's answer: Using .drop=FALSE can give potentially unexpected results when one or more grouping variables are not coded as factors. See examples below:

library(dplyr)
data(iris)

# Add an additional level to Species
iris$Species = factor(iris$Species, levels=c(levels(iris$Species), "empty_level"))

# Species is a factor and empty groups are included in the output
iris %>% group_by(Species, .drop=FALSE) %>% tally

#> Species n
#> 1 setosa 50
#> 2 versicolor 50
#> 3 virginica 50
#> 4 empty_level 0

# Add character column
iris$group2 = c(rep(c("A","B"), 50), rep(c("B","C"), each=25))

# Empty groups involving combinations of Species and group2 are not included in output
iris %>% group_by(Species, group2, .drop=FALSE) %>% tally

#> Species group2 n
#> 1 setosa A 25
#> 2 setosa B 25
#> 3 versicolor A 25
#> 4 versicolor B 25
#> 5 virginica B 25
#> 6 virginica C 25
#> 7 empty_level <NA> 0

# Turn group2 into a factor
iris$group2 = factor(iris$group2)

# Now all possible combinations of Species and group2 are included in the output,
# whether present in the data or not
iris %>% group_by(Species, group2, .drop=FALSE) %>% tally

#> Species group2 n
#> 1 setosa A 25
#> 2 setosa B 25
#> 3 setosa C 0
#> 4 versicolor A 25
#> 5 versicolor B 25
#> 6 versicolor C 0
#> 7 virginica A 0
#> 8 virginica B 25
#> 9 virginica C 25
#> 10 empty_level A 0
#> 11 empty_level B 0
#> 12 empty_level C 0

Created on 2019-03-13 by the reprex package (v0.2.1)

Counting agruped values: Include 0 values when using summarise(n())

Since this is tagged with dplyr you could modify your code to be:

out <- df %>% 
mutate(L = factor(case_when(o == 1 & e == 1 ~ 'a',
o == 0 & e == 1 ~ 'b',
o == 1 & e == 0 ~ 'c',
o == 0 & e == 0 ~ 'd'),
levels = c('a', 'b', 'c', 'd'))) %>%
select(L) %>% table(L = .) %>% data.frame

As others pointed out, the key is to factor L and add all the necessary levels.

#out
# L Freq
#1 a 3
#2 b 1
#3 c 0
#4 d 0

Showing cells with zero instances of a factor in a summary table instead of omitting them

We may use complete along with ungroup (without it we would get too many combinations):

df2 %>% group_by(var1, var2) %>% summarise(count = n()) %>% ungroup() %>%
complete(var1, var2, fill = list(count = 0))
# A tibble: 4 x 3
# var1 var2 count
# <fct> <fct> <dbl>
# 1 A C 3
# 2 A D 0
# 3 B C 7
# 4 B D 0

or complete and distinct:

df2 %>% group_by(var1, var2) %>% summarise(count = n()) %>%
complete(var1, var2, fill = list(count = 0)) %>% distinct()
# A tibble: 4 x 3
# var1 var2 count
# <fct> <fct> <dbl>
# 1 A C 3
# 2 A D 0
# 3 B C 7
# 4 B D 0

How to add 0 counts on x-axis using geom_col?

Use count(nyWHO, best.resp, .drop = FALSE)

d <- pp %>% 
as_tibble() %>%
mutate(nyWHO = as.factor(WHO),
best.resp = as.factor(case_when(best_rad == "CR" ~ 4,
best_rad == "PR" ~ 3,
best_rad == "SD" ~ 2,
best_rad == "PD" ~ 1))) %>%
count(nyWHO, best.resp, .drop = FALSE)
d
# A tibble: 12 x 3
nyWHO best.resp n
<fct> <fct> <int>
1 1 1 11
2 1 2 41
3 1 3 3
4 1 4 0
5 2 1 22
6 2 2 13
7 2 3 5
8 2 4 1
9 3 1 23
10 3 2 9
11 3 3 1
12 3 4 4

ggplot(...)

Sample Image

Add in empty rows when joining tables

Using the results data you posted that isn't what you want:

library(tidyverse)
x <- c("Worker Week dpd fuse ",
"person1 1 10 5 ",
"person1 2 0 5 ",
"person1 3 10 ",
"person1 4 10 5 ",
"person1 6 10 5 ",
"person2 1 10 5 ",
"person2 2 50 5 ",
"person2 3 10 ",
"person2 4 10 5 ",
"person2 5 10 5 ",
"person2 6 10 5 ") %>%
read_table()


x %>% complete(Worker, Week)

Should give:

# A tibble: 12 x 4
Worker Week dpd fuse
<chr> <dbl> <dbl> <dbl>
1 person1 1 10 5
2 person1 2 0 5
3 person1 3 10 NA
4 person1 4 10 5
5 person1 5 NA NA
6 person1 6 10 5
7 person2 1 10 5
8 person2 2 50 5
9 person2 3 10 NA
10 person2 4 10 5
11 person2 5 10 5
12 person2 6 10 5

complete() has a options for filling in missing data, link to reference above by @aosmith. Filling NA with 0 shouldn't be a problem.

dplyr - check if month is there, if not, add it in with an NA

This looks like a job for tidyr::complete. As you are missing both id variables and months in your original dataset, you'll need to define the values you need filled in via complete. You define what you want to the missing values entered as with fill (although your Not found value will change your column from one that was potentially a column of numbers to a column of characters).

suppressPackageStartupMessages( library(dplyr) )
library(tidyr)

df %>%
complete(id = c("a","b", "c", "d"),
`billing months` = required_months$`required months`,
fill = list(value = "Not found") )

#> Warning: Column `id` joining character vector and factor, coercing into
#> character vector

#> # A tibble: 12 x 3
#> id `billing months` value
#> <chr> <date> <chr>
#> 1 a 2016-07-01 1
#> 2 a 2016-08-01 Not found
#> 3 a 2016-09-01 2
#> 4 b 2016-07-01 3
#> 5 b 2016-08-01 4
#> 6 b 2016-09-01 5
#> 7 c 2016-07-01 Not found
#> 8 c 2016-08-01 6
#> 9 c 2016-09-01 7
#> 10 d 2016-07-01 Not found
#> 11 d 2016-08-01 Not found
#> 12 d 2016-09-01 Not found

Created on 2018-03-29 by the reprex package (v0.2.0).

Adding NULL when no variable data

Actually, a dplyr solution has already been solved here using the complete function after the count function in your code. You choose the fill=list(value=0) option for filling those missing rows with the values you need, but it could be any other.

Note, you have to ungroup first or you will be doing this operation once per group, thus duplicating your rows.

This is pretty straightforward now and more adjusted to the way you are expressing your needs:

    df1 %>%
group_by(Course,Gender) %>%
count %>%
ungroup() %>%
complete(Course,Gender,fill=list(n=0))



# A tibble: 9 x 3
Course Gender n
<fct> <fct> <dbl>
1 English1 Female 1
2 English1 Male 3
3 English1 Unknown 0
4 English2 Female 2
5 English2 Male 1
6 English2 Unknown 1
7 English3 Female 3
8 English3 Male 0
9 English3 Unknown 1

Particular ratio using dplyr and tidyr

complete.cases <- c("Class_0_1","Class_1_3","Class_3_9", "Class_9_25","Class_25_50")
my.ds %>% group_by(ClassType = factor(ClassType, levels = complete.cases), grp = lag(match(ClassType, unique(ClassType)), default = 1)) %>% slice_tail(n = 1) %>%
ungroup %>%summarise(ClassType, velocity = c(NA, diff(AT))/c(NA, diff(day))) %>%
complete(ClassType) %>%
fill(velocity, .direction = "updown")
# ClassType velocity
# <fct> <dbl>
# 1 Class_0_1 0.224
# 2 Class_1_3 0.224
# 3 Class_3_9 0.224
# 4 Class_9_25 0.224
# 5 Class_9_25 0.0215
# 6 Class_25_50 0.306


Related Topics



Leave a reply



Submit