How to Use Tidyr to Fill in Completed Rows Within Each Value of a Grouping Variable

How do I use tidyr to fill in completed rows within each value of a grouping variable?

You can use the trick of "grouping" things to complete within complete using c(). This makes it so that it is only completed using preexisting combinations of the grouped variables.

library(tidyr)
dat %>% complete(c(id, c, d), choice) 
     id     c          d choice
  (int) (dbl)     (time)  (int)
1     1     9       <NA>      5
2     1     9       <NA>      6
3     1     9       <NA>      7
4     2    NA 2015-09-30      5
5     2    NA 2015-09-30      6
6     2    NA 2015-09-30      7
7     3    11 2015-09-29      5
8     3    11 2015-09-29      6
9     3    11 2015-09-29      7

Expand `NA`s to multiple rows filling with a grouping variable

Here is an alternative, which will keep duplicated SerialNo's

x <- subset(begin_ex_df, is.na(begin_ex_df$PartsUsed)) 
x <- expand.grid(SerialNo = x$SerialNo, PartsUsed = c("A", "B"))
x$Values <- 0 
y <- subset(begin_ex_df, !is.na(begin_ex_df$PartsUsed))

new_df <- rbind(x, y)

new_df[order(new_df$SerialNo),]

#  SerialNo PartsUsed Values
#1      1234         A      0
#2      1234         B      0
#3      1357         A      0
#4      1357         B      0
#5      1357         A     11
#6      1357         B     14
#7      2468         A      0
#8      2468         B      0
#9      5678         A      0
#10     5678         B      0
#11     5678         A     10
#12     5678         B     15
#13     8080         A      0
#14     8080         B      0
#15     9012         A      0
#16     9012         B      0

Edit

If you want to use tidyr complete then you can group_by creating an index for the groups with NA and without, and then drop NA at the end (you can also remove the Ind column).

library(dplyr)
library(tidyr)

begin_ex_df %>% 
   group_by(SerialNo, Ind = if_else(is.na(PartsUsed), "no", "yes")) %>% 
   complete(., SerialNo, PartsUsed = c("A", "B"), fill = list(Values = 0)) %>% 
   na.omit()

#  Ind   SerialNo PartsUsed Values
#   <chr>    <dbl> <chr>      <dbl>
# 1 no        1234 A              0
# 2 no        1234 B              0
# 3 no        1357 A              0
# 4 no        1357 B              0
# 5 yes       1357 A             11
# 6 yes       1357 B             14
# 7 no        2468 A              0
# 8 no        2468 B              0
# 9 no        5678 A              0
#10 no        5678 B              0
#11 yes       5678 A             10
#12 yes       5678 B             15
#13 no        8080 A              0
#14 no        8080 B              0
#15 no        9012 A              0
#16 no        9012 B              0

Completing a sequence of integers by group with tidyverse in R

You could do something like:

df %>% 
  group_by(Group) %>%
  mutate(newseq  = seq_along(Group) + (first(na.omit(Seq)) - sum(cumall(is.na(Seq)))) - 1) %>% 
  ungroup()

df %>% 
  group_by(Group) %>%
  mutate(newseq  = seq(first(na.omit(Seq)) - sum(cumall(is.na(Seq))), length.out = n())) %>%
  ungroup()

df %>% 
  group_by(Group) %>%
  mutate(newseq  = 0:(n() - 1) + (first(na.omit(Seq)) - sum(cumall(is.na(Seq))))) %>%
  ungroup()

All these do the same thing: shift the start of the sequence by the difference of the first non-NA value and the number of NAs before it.

Output

   Group   Seq newseq
   <int> <int>  <dbl>
 1     1    NA      3
 2     1    NA      4
 3     1    NA      5
 4     1     6      6
 5     1     7      7
 6     1     8      8
 7     1    NA      9
 8     1    10     10
 9     1    11     11
10     1    NA     12
# ... with 35 more rows

Complete column with group_by and complete

Using complete from the tidyr package should work. You can find documentation about it here.

What probably happened is that you did not remove the grouping. Then complete tries to add each of the combinations of YEAR and Region within each group. But all these combinations are already in the grouping. Thus first remove the grouping and then do the complete.

datasetALL %>% 
    group_by(YEAR,Region) %>% 
    summarise(count_number = n()) %>%
    ungroup() %>%
    complete(Year, Region, fill = list(count_number = 1))

Spreading data within dplyr pipe; dealing with multiple entries by group

An option is also to create a sequence by group (rowid) before the pivot_wider to take care of the duplicate elements

library(data.table)
library(tidyr)
library(dplyr)
df%>%
  group_by(Hour,dt)%>%
  summarise(Programmes=Programme_watched, .groups = 'drop') %>% 
  mutate(rn = rowid(Hour, dt)) %>% 
  pivot_wider(names_from = dt, values_from = Programmes) %>%
  select(-rn)

-output

# A tibble: 6 × 12
   Hour `05/06/2021` `07/06/2021` `08/06/2021` `10/06/2021` `12/06/2021` `14/06/2021` `15/06/2021` `09/06/2021` `11/06/2021`   `13/06/2021` `06/06/2021`
  <int> <chr>        <chr>        <chr>        <chr>        <chr>        <chr>        <chr>        <chr>        <chr>          <chr>        <chr>       
1     0 Wrestling    Drama        Drama        Football     Rugby        Movie        Music        <NA>          <NA>          <NA>         <NA>        
2     0 <NA>         <NA>         <NA>         <NA>         Movie        News         <NA>         <NA>          <NA>          <NA>         <NA>        
3     1 <NA>         <NA>         <NA>         Racing       <NA>         Movie        <NA>         Tennis       "Documentary " Movie        <NA>        
4     1 <NA>         <NA>         <NA>         Racing       <NA>         <NA>         <NA>         <NA>          <NA>          <NA>         <NA>        
5     2 Athletics    Children     Music        Football     Drama        <NA>         <NA>         Football     "Rugby"        <NA>         News        
6     2 <NA>         <NA>         Movie        <NA>         <NA>         <NA>         <NA>         News          <NA>          <NA>         <NA>

R Add rows to each group so each group has same number, and specify other variable

tidyr::complete(df, week, session)

# A tibble: 16 x 3
    week session work 
   <dbl>   <dbl> <chr>
 1     1       1 done 
 2     1       2 done 
 3     1       3 NA   
 4     1       4 NA   
 5     2       1 done 
 6     2       2 done 
 7     2       3 NA   
 8     2       4 NA   
 9     3       1 done 
10     3       2 done 
11     3       3 done 
12     3       4 NA   
13     4       1 done 
14     4       2 done 
15     4       3 done 
16     4       4 done

Complete a data.frame with new values by group

You can complete the missing observations per id :

library(dplyr)

df %>% group_by(id) %>% tidyr::complete(year = min(year):max(year), semester)

#      id  year semester
#   <dbl> <dbl>    <dbl>
# 1     1  2000        1
# 2     1  2000        2
# 3     1  2001        1
# 4     1  2001        2
# 5     2  1999        1
# 6     2  1999        2
# 7     2  2000        1
# 8     2  2000        2
# 9     2  2001        1
#10     2  2001        2

Fill missing values in data.frame using dplyr complete within groups

We can try with expand and left_join

library(dplyr)
library(tidyr)
df %>%
   group_by(coursecode, year, region) %>%
   expand(week = full_seq(week, 1)) %>% 
   left_join(., df)
#   coursecode  year region  week values othervalues
#       <fctr> <dbl> <fctr> <dbl>  <int>       <int>
#1           A  2000    Big     1      1          12
#2           A  2000    Big     2     NA          NA
#3           A  2000    Big     3      2          13
#4           A  2000    Big     4      3          14
#5           A  2001    Big     1      4          15
#6           A  2001    Big     2      5          16
#7           A  2001    Big     3      6          17
#8           B  2000    Big     2      7          18
#9           B  2000    Big     3      8          19
#10          B  2000    Big     4     NA          NA
#11          B  2000    Big     5      9          20
#12          B  2001    Big     3     10          21
#13          B  2001    Big     4     11          22
#14          B  2001    Big     5     12          23

group_by() into fill() not working as expected

Looks like this has been fixed in the development version of tidyr. You now get the expected result per id using fill from tidyr_0.3.1.9000.

df %>% group_by(id) %>% fill(email)

Source: local data frame [6 x 2]
Groups: id [3]

     id         email
  (dbl)        (fctr)
1     1 bob@email.com
2     1 bob@email.com
3     2 joe@email.com
4     2 joe@email.com
5     3            NA
6     3            NA