How do I use tidyr to fill in completed rows within each value of a grouping variable?
You can use the trick of "grouping" things to complete within complete using c()
. This makes it so that it is only completed using preexisting combinations of the grouped variables.
library(tidyr)
dat %>% complete(c(id, c, d), choice)
id c d choice
(int) (dbl) (time) (int)
1 1 9 <NA> 5
2 1 9 <NA> 6
3 1 9 <NA> 7
4 2 NA 2015-09-30 5
5 2 NA 2015-09-30 6
6 2 NA 2015-09-30 7
7 3 11 2015-09-29 5
8 3 11 2015-09-29 6
9 3 11 2015-09-29 7
Expand `NA`s to multiple rows filling with a grouping variable
Here is an alternative, which will keep duplicated SerialNo's
x <- subset(begin_ex_df, is.na(begin_ex_df$PartsUsed))
x <- expand.grid(SerialNo = x$SerialNo, PartsUsed = c("A", "B"))
x$Values <- 0
y <- subset(begin_ex_df, !is.na(begin_ex_df$PartsUsed))
new_df <- rbind(x, y)
new_df[order(new_df$SerialNo),]
# SerialNo PartsUsed Values
#1 1234 A 0
#2 1234 B 0
#3 1357 A 0
#4 1357 B 0
#5 1357 A 11
#6 1357 B 14
#7 2468 A 0
#8 2468 B 0
#9 5678 A 0
#10 5678 B 0
#11 5678 A 10
#12 5678 B 15
#13 8080 A 0
#14 8080 B 0
#15 9012 A 0
#16 9012 B 0
Edit
If you want to use tidyr
complete
then you can group_by
creating an index for the groups with NA
and without, and then drop NA
at the end (you can also remove the Ind
column).
library(dplyr)
library(tidyr)
begin_ex_df %>%
group_by(SerialNo, Ind = if_else(is.na(PartsUsed), "no", "yes")) %>%
complete(., SerialNo, PartsUsed = c("A", "B"), fill = list(Values = 0)) %>%
na.omit()
# Ind SerialNo PartsUsed Values
# <chr> <dbl> <chr> <dbl>
# 1 no 1234 A 0
# 2 no 1234 B 0
# 3 no 1357 A 0
# 4 no 1357 B 0
# 5 yes 1357 A 11
# 6 yes 1357 B 14
# 7 no 2468 A 0
# 8 no 2468 B 0
# 9 no 5678 A 0
#10 no 5678 B 0
#11 yes 5678 A 10
#12 yes 5678 B 15
#13 no 8080 A 0
#14 no 8080 B 0
#15 no 9012 A 0
#16 no 9012 B 0
Completing a sequence of integers by group with tidyverse in R
You could do something like:
df %>%
group_by(Group) %>%
mutate(newseq = seq_along(Group) + (first(na.omit(Seq)) - sum(cumall(is.na(Seq)))) - 1) %>%
ungroup()
Or
df %>%
group_by(Group) %>%
mutate(newseq = seq(first(na.omit(Seq)) - sum(cumall(is.na(Seq))), length.out = n())) %>%
ungroup()
Or
df %>%
group_by(Group) %>%
mutate(newseq = 0:(n() - 1) + (first(na.omit(Seq)) - sum(cumall(is.na(Seq))))) %>%
ungroup()
All these do the same thing: shift the start of the sequence by the difference of the first non-NA value and the number of NAs before it.
Output
Group Seq newseq
<int> <int> <dbl>
1 1 NA 3
2 1 NA 4
3 1 NA 5
4 1 6 6
5 1 7 7
6 1 8 8
7 1 NA 9
8 1 10 10
9 1 11 11
10 1 NA 12
# ... with 35 more rows
Complete column with group_by and complete
Using complete
from the tidyr package should work. You can find documentation about it here.
What probably happened is that you did not remove the grouping. Then complete tries to add each of the combinations of YEAR
and Region
within each group. But all these combinations are already in the grouping. Thus first remove the grouping and then do the complete.
datasetALL %>%
group_by(YEAR,Region) %>%
summarise(count_number = n()) %>%
ungroup() %>%
complete(Year, Region, fill = list(count_number = 1))
Spreading data within dplyr pipe; dealing with multiple entries by group
An option is also to create a sequence by group (rowid
) before the pivot_wider
to take care of the duplicate elements
library(data.table)
library(tidyr)
library(dplyr)
df%>%
group_by(Hour,dt)%>%
summarise(Programmes=Programme_watched, .groups = 'drop') %>%
mutate(rn = rowid(Hour, dt)) %>%
pivot_wider(names_from = dt, values_from = Programmes) %>%
select(-rn)
-output
# A tibble: 6 × 12
Hour `05/06/2021` `07/06/2021` `08/06/2021` `10/06/2021` `12/06/2021` `14/06/2021` `15/06/2021` `09/06/2021` `11/06/2021` `13/06/2021` `06/06/2021`
<int> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 0 Wrestling Drama Drama Football Rugby Movie Music <NA> <NA> <NA> <NA>
2 0 <NA> <NA> <NA> <NA> Movie News <NA> <NA> <NA> <NA> <NA>
3 1 <NA> <NA> <NA> Racing <NA> Movie <NA> Tennis "Documentary " Movie <NA>
4 1 <NA> <NA> <NA> Racing <NA> <NA> <NA> <NA> <NA> <NA> <NA>
5 2 Athletics Children Music Football Drama <NA> <NA> Football "Rugby" <NA> News
6 2 <NA> <NA> Movie <NA> <NA> <NA> <NA> News <NA> <NA> <NA>
R Add rows to each group so each group has same number, and specify other variable
tidyr::complete(df, week, session)
# A tibble: 16 x 3
week session work
<dbl> <dbl> <chr>
1 1 1 done
2 1 2 done
3 1 3 NA
4 1 4 NA
5 2 1 done
6 2 2 done
7 2 3 NA
8 2 4 NA
9 3 1 done
10 3 2 done
11 3 3 done
12 3 4 NA
13 4 1 done
14 4 2 done
15 4 3 done
16 4 4 done
Complete a data.frame with new values by group
You can complete
the missing observations per id
:
library(dplyr)
df %>% group_by(id) %>% tidyr::complete(year = min(year):max(year), semester)
# id year semester
# <dbl> <dbl> <dbl>
# 1 1 2000 1
# 2 1 2000 2
# 3 1 2001 1
# 4 1 2001 2
# 5 2 1999 1
# 6 2 1999 2
# 7 2 2000 1
# 8 2 2000 2
# 9 2 2001 1
#10 2 2001 2
Fill missing values in data.frame using dplyr complete within groups
We can try with expand
and left_join
library(dplyr)
library(tidyr)
df %>%
group_by(coursecode, year, region) %>%
expand(week = full_seq(week, 1)) %>%
left_join(., df)
# coursecode year region week values othervalues
# <fctr> <dbl> <fctr> <dbl> <int> <int>
#1 A 2000 Big 1 1 12
#2 A 2000 Big 2 NA NA
#3 A 2000 Big 3 2 13
#4 A 2000 Big 4 3 14
#5 A 2001 Big 1 4 15
#6 A 2001 Big 2 5 16
#7 A 2001 Big 3 6 17
#8 B 2000 Big 2 7 18
#9 B 2000 Big 3 8 19
#10 B 2000 Big 4 NA NA
#11 B 2000 Big 5 9 20
#12 B 2001 Big 3 10 21
#13 B 2001 Big 4 11 22
#14 B 2001 Big 5 12 23
group_by() into fill() not working as expected
Looks like this has been fixed in the development version of tidyr. You now get the expected result per id using fill
from tidyr_0.3.1.9000.
df %>% group_by(id) %>% fill(email)
Source: local data frame [6 x 2]
Groups: id [3]
id email
(dbl) (fctr)
1 1 bob@email.com
2 1 bob@email.com
3 2 joe@email.com
4 2 joe@email.com
5 3 NA
6 3 NA
Related Topics
Cast String Directly to Idatetime
Scoping of Variables in Aes(...) Inside a Function in Ggplot
Dygraph in R Multiple Plots at Once
Print a List of Dynamically-Sized Plots in Knitr
Back-To-Back Barplot with Independent Axes R
R Predict Function Returning Too Many Values
R Ggplot2: Using Stat_Summary (Mean) and Logarithmic Scale
Delete Rows with Less Than 7 Characters
What's the Difference Between Substitute and Quote in R
Using Variable Value as Column Name in Data.Frame or Cbind
Combining More Than 2 Columns by Removing Na's in R
Export Both Image and Data from R to an Excel Spreadsheet
Q-Q Plot with Ggplot2::Stat_Qq, Colours, Single Group
R Data.Table Fread Command:How to Read Large Files with Irregular Separators
Warning "The Condition Has Length > 1 and Only the First Element Will Be Used"
Operator Precedence of "Unary Minus" (-) and Exponentiation (^) Outside VS. Inside Function