R: Filling Missing Dates in a Time Series

Insert rows for missing dates/times

I think the easiest thing ist to set Date first as already described, convert to zoo, and then just set a merge:

df$timestamp<-as.POSIXct(df$timestamp,format="%m/%d/%y %H:%M")

df1.zoo<-zoo(df[,-1],df[,1]) #set date to Index

df2 <- merge(df1.zoo,zoo(,seq(start(df1.zoo),end(df1.zoo),by="min")), all=TRUE)

Start and end are given from your df1 (original data) and you are setting by - e.g min - as you need for your example. all=TRUE sets all missing values at the missing dates to NAs.

Adding missing dates in time series data

Use tidyr::complete :

library(dplyr)

df %>%
  mutate(Date = as.Date(Date, "%B %d, %Y")) %>%
  tidyr::complete(Date = seq(as.Date('2008-01-01'), as.Date('2020-03-31'), 
                           by = 'day'), fill = list(Val = 1)) %>%
  mutate(Date = format(Date, "%B %d, %Y"))

# A tibble: 4,475 x 2
#   Date               Val
#   <chr>            <dbl>
# 1 January 01, 2008     1
# 2 January 02, 2008     1
# 3 January 03, 2008     1
# 4 January 04, 2008     1
# 5 January 05, 2008    26
# 6 January 06, 2008     1
# 7 January 07, 2008     1
# 8 January 08, 2008     1
# 9 January 09, 2008     1
#10 January 10, 2008     1
# … with 4,465 more rows

data

df <- structure(list(Date = c("September 16, 2012", "September 19, 2014", 
"January 05, 2008", "June 07, 2017", "December 15, 2019", "May 28, 2020"
), Val = c(32L, 33L, 26L, 2L, 3L, 18L)), class = "data.frame", 
row.names = c(NA, -6L))

How to add only missing Dates in Dataframe

Here's a correction of your approach, in base R.

Replace max(t1$Date) bySys.Date() in your real application:

t2<-merge(data.frame(Date= as.Date(min(t1$Date):max(t1$Date),"1970-1-1")),
          t1, by = "Date", all = TRUE)
t2[is.na(t2)] <- 0

#         Date Val1 Val2
# 1 2018-04-01  125 0.05
# 2 2018-04-02    0 0.00
# 3 2018-04-03  458 2.99
# 4 2018-04-04    0 0.00
# 5 2018-04-05  354 1.25

data

t1 <- read.table(text="Date        Val1     Val2
'2018-04-01'  125 0.05
'2018-04-03'  458 2.99
'2018-04-05'  354 1.25",h=T,strin=F)
t1$Date <- as.Date(df$Date)

Filling missing date months in a time series by Item

You can create a sequence of yearmon objects for each ITEM and use it in complete.

library(dplyr)
library(zoo)
library(tidyr)

df1 %>%
  mutate(Date = as.yearmon(Date, '%b-%Y')) %>%
  group_by(ITEM) %>%
  complete(Date = seq(min(Date), max(Date), 1/12)) %>%
  ungroup

#   ITEM  Date     
#   <chr> <yearmon>
# 1 A     Jan 2020 
# 2 A     Feb 2020 
# 3 A     Mar 2020 
# 4 A     Apr 2020 
# 5 A     May 2020 
# 6 A     Jun 2020 
# 7 A     Jul 2020 
# 8 B     Jan 2020 
# 9 B     Feb 2020 
#10 B     Mar 2020 
#11 B     Apr 2020 
#12 B     May 2020 
#13 B     Jun 2020 
#14 B     Jul 2020 
#15 B     Aug 2020

If you want a sequence of date objects you can use :

df1 %>%
  mutate(Date = as.Date(as.yearmon(Date, '%b-%Y'))) %>%
  group_by(ITEM) %>%
  complete(Date = seq(min(Date), max(Date), 'month')) %>%
  ungroup()

Filling missing dates in a grouped time series - a tidyverse-way?

tidyr has some great tools for these sorts of problems. Take a look at complete.

library(dplyr)
library(tidyr)
library(lubridate)

want <- df.missing %>% 
  ungroup() %>%
  complete(nesting(d1, d2), date = seq(min(date), max(date), by = "day"))

want %>% filter(d1 == "A" & d2 == 5) 

#> # A tibble: 10 x 5
#>        d1    d2       date         v1        v2
#>    <fctr> <dbl>     <date>      <dbl>     <dbl>
#>  1      A     5 2017-01-01         NA        NA
#>  2      A     5 2017-01-02 0.21879954 0.1335497
#>  3      A     5 2017-01-03 0.32977018 0.9802127
#>  4      A     5 2017-01-04 0.23902573 0.1206089
#>  5      A     5 2017-01-05 0.19617465 0.7378315
#>  6      A     5 2017-01-06 0.13373890 0.9493668
#>  7      A     5 2017-01-07 0.48613541 0.3392834
#>  8      A     5 2017-01-08 0.35698708 0.3696965
#>  9      A     5 2017-01-09 0.08498474 0.8354756
#> 10      A     5 2017-01-10         NA        NA

Fill missing values in time series using previous day data - R

Edit

Thanks to @G. Grothendieck to mention that na.locf0 has maxgap argument which can handle the 5-day condition directly.

data[-1] <- lapply(data[-1], zoo::na.locf0, maxgap = 5)
data

Earlier Answer

You can write a function with rle and zoo::na.locf0 to replace NA only if the length of consecutive NA is less than equal to 5. Apply this function for multiple columns with lapply.

conditionally_replace_na <- function(x) {
  ifelse(with(rle(is.na(x)), rep(lengths, lengths)) <= 5 & is.na(x), 
               zoo::na.locf0(x), x)  
}

data[-1] <- lapply(data[-1], conditionally_replace_na)
data

#         Date time_series_1 time_series_2 time_series_3
#1  01-01-2019            NA            10             8
#2  02-01-2019             5            10            10
#3  03-01-2019            10            10            20
#4  04-01-2019            20             6            40
#5  05-01-2019            30             6            40
#6  06-01-2019            30             8            40
#7  07-01-2019             7            NA            40
#8  08-01-2019             5            NA            40
39  09-01-2019            NA            NA             5
#10 10-01-2019            NA            NA             5
#11 11-01-2019            NA            NA             7
#12 12-01-2019            NA            NA            10
#13 13-01-2019            NA            NA            11
#14 14-01-2019            NA            NA            12
#15 15-01-2019            NA            NA            12
#16 16-01-2019            NA            NA             9
#17 17-01-2019            NA            NA            10
#18 18-01-2019            NA            NA            10
#19 19-01-2019             5            NA            11
#20 20-01-2019             5            NA            11
#21 21-01-2019             5            NA            11
#22 22-01-2019             6            NA            11

Function can also be applied with dplyr::across

library(dplyr)
data %>% mutate(across(starts_with('time_series'), conditionally_replace_na))

Fill missing dates by group

`tidyr::complete()` fills missing values

add id and date as the columns (...) to expand for

library(tidyverse)

complete(dat, id, date)

# A tibble: 16 x 3
      id date       value
   <dbl> <date>     <dbl>
 1  1.00 2017-01-01  30.0
 2  1.00 2017-02-01  30.0
 3  1.00 2017-03-01  NA  
 4  1.00 2017-04-01  25.0
 5  2.00 2017-01-01  NA  
 6  2.00 2017-02-01  25.0
 7  2.00 2017-03-01  NA  
 8  2.00 2017-04-01  NA  
 9  3.00 2017-01-01  25.0
10  3.00 2017-02-01  25.0
11  3.00 2017-03-01  25.0
12  3.00 2017-04-01  NA  
13  4.00 2017-01-01  20.0
14  4.00 2017-02-01  20.0
15  4.00 2017-03-01  NA  
16  4.00 2017-04-01  20.0

R: Filling Missing Dates in a Time Series

Insert rows for missing dates/times

Adding missing dates in time series data

How to add only missing Dates in Dataframe

Filling missing date months in a time series by Item

Filling missing dates in a grouped time series - a tidyverse-way?

Fill missing values in time series using previous day data - R

Fill missing dates by group

`tidyr::complete()` fills missing values

Related Topics

Leave a reply

Insert rows for missing dates/times

Adding missing dates in time series data

How to add only missing Dates in Dataframe

Filling missing date months in a time series by Item

Filling missing dates in a grouped time series - a tidyverse-way?

Fill missing values in time series using previous day data - R

Fill missing dates by group

tidyr::complete() fills missing values

Related Topics

Leave a reply

`tidyr::complete()` fills missing values