How to Use Tidyr to Fill in Completed Rows Within Each Value of a Grouping Variable

How do I use tidyr to fill in completed rows within each value of a grouping variable?

You can use the trick of "grouping" things to complete within complete using c(). This makes it so that it is only completed using preexisting combinations of the grouped variables.

library(tidyr)
dat %>% complete(c(id, c, d), choice)
id c d choice
(int) (dbl) (time) (int)
1 1 9 <NA> 5
2 1 9 <NA> 6
3 1 9 <NA> 7
4 2 NA 2015-09-30 5
5 2 NA 2015-09-30 6
6 2 NA 2015-09-30 7
7 3 11 2015-09-29 5
8 3 11 2015-09-29 6
9 3 11 2015-09-29 7

Expand `NA`s to multiple rows filling with a grouping variable

Here is an alternative, which will keep duplicated SerialNo's

x <- subset(begin_ex_df, is.na(begin_ex_df$PartsUsed)) 
x <- expand.grid(SerialNo = x$SerialNo, PartsUsed = c("A", "B"))
x$Values <- 0
y <- subset(begin_ex_df, !is.na(begin_ex_df$PartsUsed))

new_df <- rbind(x, y)

new_df[order(new_df$SerialNo),]

# SerialNo PartsUsed Values
#1 1234 A 0
#2 1234 B 0
#3 1357 A 0
#4 1357 B 0
#5 1357 A 11
#6 1357 B 14
#7 2468 A 0
#8 2468 B 0
#9 5678 A 0
#10 5678 B 0
#11 5678 A 10
#12 5678 B 15
#13 8080 A 0
#14 8080 B 0
#15 9012 A 0
#16 9012 B 0

Edit

If you want to use tidyr complete then you can group_by creating an index for the groups with NA and without, and then drop NA at the end (you can also remove the Ind column).

library(dplyr)
library(tidyr)

begin_ex_df %>%
group_by(SerialNo, Ind = if_else(is.na(PartsUsed), "no", "yes")) %>%
complete(., SerialNo, PartsUsed = c("A", "B"), fill = list(Values = 0)) %>%
na.omit()

# Ind SerialNo PartsUsed Values
# <chr> <dbl> <chr> <dbl>
# 1 no 1234 A 0
# 2 no 1234 B 0
# 3 no 1357 A 0
# 4 no 1357 B 0
# 5 yes 1357 A 11
# 6 yes 1357 B 14
# 7 no 2468 A 0
# 8 no 2468 B 0
# 9 no 5678 A 0
#10 no 5678 B 0
#11 yes 5678 A 10
#12 yes 5678 B 15
#13 no 8080 A 0
#14 no 8080 B 0
#15 no 9012 A 0
#16 no 9012 B 0

Completing a sequence of integers by group with tidyverse in R

You could do something like:

df %>% 
group_by(Group) %>%
mutate(newseq = seq_along(Group) + (first(na.omit(Seq)) - sum(cumall(is.na(Seq)))) - 1) %>%
ungroup()

Or

df %>% 
group_by(Group) %>%
mutate(newseq = seq(first(na.omit(Seq)) - sum(cumall(is.na(Seq))), length.out = n())) %>%
ungroup()

Or

df %>% 
group_by(Group) %>%
mutate(newseq = 0:(n() - 1) + (first(na.omit(Seq)) - sum(cumall(is.na(Seq))))) %>%
ungroup()

All these do the same thing: shift the start of the sequence by the difference of the first non-NA value and the number of NAs before it.

Output

   Group   Seq newseq
<int> <int> <dbl>
1 1 NA 3
2 1 NA 4
3 1 NA 5
4 1 6 6
5 1 7 7
6 1 8 8
7 1 NA 9
8 1 10 10
9 1 11 11
10 1 NA 12
# ... with 35 more rows

Complete column with group_by and complete

Using complete from the tidyr package should work. You can find documentation about it here.

What probably happened is that you did not remove the grouping. Then complete tries to add each of the combinations of YEAR and Region within each group. But all these combinations are already in the grouping. Thus first remove the grouping and then do the complete.

datasetALL %>% 
group_by(YEAR,Region) %>%
summarise(count_number = n()) %>%
ungroup() %>%
complete(Year, Region, fill = list(count_number = 1))

Spreading data within dplyr pipe; dealing with multiple entries by group

An option is also to create a sequence by group (rowid) before the pivot_wider to take care of the duplicate elements

library(data.table)
library(tidyr)
library(dplyr)
df%>%
group_by(Hour,dt)%>%
summarise(Programmes=Programme_watched, .groups = 'drop') %>%
mutate(rn = rowid(Hour, dt)) %>%
pivot_wider(names_from = dt, values_from = Programmes) %>%
select(-rn)

-output

# A tibble: 6 × 12
Hour `05/06/2021` `07/06/2021` `08/06/2021` `10/06/2021` `12/06/2021` `14/06/2021` `15/06/2021` `09/06/2021` `11/06/2021` `13/06/2021` `06/06/2021`
<int> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 0 Wrestling Drama Drama Football Rugby Movie Music <NA> <NA> <NA> <NA>
2 0 <NA> <NA> <NA> <NA> Movie News <NA> <NA> <NA> <NA> <NA>
3 1 <NA> <NA> <NA> Racing <NA> Movie <NA> Tennis "Documentary " Movie <NA>
4 1 <NA> <NA> <NA> Racing <NA> <NA> <NA> <NA> <NA> <NA> <NA>
5 2 Athletics Children Music Football Drama <NA> <NA> Football "Rugby" <NA> News
6 2 <NA> <NA> Movie <NA> <NA> <NA> <NA> News <NA> <NA> <NA>

R Add rows to each group so each group has same number, and specify other variable

tidyr::complete(df, week, session)

# A tibble: 16 x 3
week session work
<dbl> <dbl> <chr>
1 1 1 done
2 1 2 done
3 1 3 NA
4 1 4 NA
5 2 1 done
6 2 2 done
7 2 3 NA
8 2 4 NA
9 3 1 done
10 3 2 done
11 3 3 done
12 3 4 NA
13 4 1 done
14 4 2 done
15 4 3 done
16 4 4 done

Complete a data.frame with new values by group

You can complete the missing observations per id :

library(dplyr)

df %>% group_by(id) %>% tidyr::complete(year = min(year):max(year), semester)

# id year semester
# <dbl> <dbl> <dbl>
# 1 1 2000 1
# 2 1 2000 2
# 3 1 2001 1
# 4 1 2001 2
# 5 2 1999 1
# 6 2 1999 2
# 7 2 2000 1
# 8 2 2000 2
# 9 2 2001 1
#10 2 2001 2

Fill missing values in data.frame using dplyr complete within groups

We can try with expand and left_join

library(dplyr)
library(tidyr)
df %>%
group_by(coursecode, year, region) %>%
expand(week = full_seq(week, 1)) %>%
left_join(., df)
# coursecode year region week values othervalues
# <fctr> <dbl> <fctr> <dbl> <int> <int>
#1 A 2000 Big 1 1 12
#2 A 2000 Big 2 NA NA
#3 A 2000 Big 3 2 13
#4 A 2000 Big 4 3 14
#5 A 2001 Big 1 4 15
#6 A 2001 Big 2 5 16
#7 A 2001 Big 3 6 17
#8 B 2000 Big 2 7 18
#9 B 2000 Big 3 8 19
#10 B 2000 Big 4 NA NA
#11 B 2000 Big 5 9 20
#12 B 2001 Big 3 10 21
#13 B 2001 Big 4 11 22
#14 B 2001 Big 5 12 23

group_by() into fill() not working as expected

Looks like this has been fixed in the development version of tidyr. You now get the expected result per id using fill from tidyr_0.3.1.9000.

df %>% group_by(id) %>% fill(email)

Source: local data frame [6 x 2]
Groups: id [3]

id email
(dbl) (fctr)
1 1 bob@email.com
2 1 bob@email.com
3 2 joe@email.com
4 2 joe@email.com
5 3 NA
6 3 NA


Related Topics



Leave a reply



Submit