Insert Rows For Missing Dates/Times

Insert rows for missing dates/times

I think the easiest thing ist to set Date first as already described, convert to zoo, and then just set a merge:

df$timestamp<-as.POSIXct(df$timestamp,format="%m/%d/%y %H:%M")

df1.zoo<-zoo(df[,-1],df[,1]) #set date to Index

df2 <- merge(df1.zoo,zoo(,seq(start(df1.zoo),end(df1.zoo),by="min")), all=TRUE)

Start and end are given from your df1 (original data) and you are setting by - e.g min - as you need for your example. all=TRUE sets all missing values at the missing dates to NAs.

Python pandas: insert rows for missing dates, time series in groupby dataframe

Use custom function with DataFrame.asfreq in GroupBy.apply and then reassign Index by GroupBy.cumcount:

df['date'] = pd.to_datetime(df['date'])

df = (df.set_index('date')
.groupby('Serial_no')
.apply(lambda x: x.asfreq('MS'))
.drop('Serial_no', axis=1))
df = df.reset_index()
df["Index"] = df.groupby("Serial_no").cumcount() + 1
print (df)
Serial_no date Index x y
0 1 2014-01-01 1 2.0 3.0
1 1 2014-02-01 2 NaN NaN
2 1 2014-03-01 3 3.0 3.0
3 1 2014-04-01 4 6.0 2.0
4 2 2011-03-01 1 5.1 1.3
5 2 2011-04-01 2 5.8 0.6
6 2 2011-05-01 3 6.5 -0.1
7 2 2011-06-01 4 NaN NaN
8 2 2011-07-01 5 3.0 5.0
9 3 2019-10-01 1 7.9 -1.5
10 3 2019-11-01 2 8.6 -2.2
11 3 2019-12-01 3 NaN NaN
12 3 2020-01-01 4 10.0 -3.6
13 3 2020-02-01 5 10.7 -4.3
14 3 2020-03-01 6 4.0 3.0

Alternative solution with DataFrame.reindex:

df['date'] = pd.to_datetime(df['date'])

f = lambda x: x.reindex(pd.date_range(x.index.min(), x.index.max(), freq='MS', name='date'))
df = df.set_index('date').groupby('Serial_no').apply(f).drop('Serial_no', axis=1)
df = df.reset_index()
df["Index"] = df.groupby("Serial_no").cumcount() + 1

Insert row to fill in missing date in R

We can first convert the date to actual date object (date1) by pasting "-01" at the end, then using complete we create a sequence of 1 month date objects for each Person. We then use fill to get Amount equal to the one before it and to get data in the original form we remove "-01" again from date1.

library(dplyr)
library(tidyr)

df %>%
mutate(date1 = as.Date(paste0(date, "-01"))) %>%
group_by(Person) %>%
complete(date1 = seq(min(date1), max(date1), by = "1 month")) %>%
fill(Amount) %>%
mutate(date = sub("-01$", "", date1)) %>%
select(-date1)

# Person date Amount
# <fct> <chr> <int>
# 1 A 2019-01 900
# 2 A 2019-02 900
# 3 A 2019-03 600
# 4 A 2019-04 300
# 5 A 2019-05 0
# 6 B 2019-04 1200
# 7 B 2019-05 1200
# 8 B 2019-06 1200
# 9 B 2019-07 800
#10 B 2019-08 400
#11 B 2019-09 0

data

df <- structure(list(Person = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 
2L), .Label = c("A", "B"), class = "factor"), date = structure(c(1L,
2L, 3L, 4L, 3L, 5L, 6L, 7L), .Label = c("2019-01", "2019-03",
"2019-04", "2019-05", "2019-07", "2019-08", "2019-09"), class = "factor"),
Amount = c(900L, 600L, 300L, 0L, 1200L, 800L, 400L, 0L)),
class = "data.frame", row.names = c(NA, -8L))

How to add only missing Dates in Dataframe

Here's a correction of your approach, in base R.

Replace max(t1$Date) bySys.Date() in your real application:

t2<-merge(data.frame(Date= as.Date(min(t1$Date):max(t1$Date),"1970-1-1")),
t1, by = "Date", all = TRUE)
t2[is.na(t2)] <- 0

# Date Val1 Val2
# 1 2018-04-01 125 0.05
# 2 2018-04-02 0 0.00
# 3 2018-04-03 458 2.99
# 4 2018-04-04 0 0.00
# 5 2018-04-05 354 1.25

data

t1 <- read.table(text="Date        Val1     Val2
'2018-04-01' 125 0.05
'2018-04-03' 458 2.99
'2018-04-05' 354 1.25",h=T,strin=F)
t1$Date <- as.Date(df$Date)

Adding missing dates in time series data

Use tidyr::complete :

library(dplyr)

df %>%
mutate(Date = as.Date(Date, "%B %d, %Y")) %>%
tidyr::complete(Date = seq(as.Date('2008-01-01'), as.Date('2020-03-31'),
by = 'day'), fill = list(Val = 1)) %>%
mutate(Date = format(Date, "%B %d, %Y"))


# A tibble: 4,475 x 2
# Date Val
# <chr> <dbl>
# 1 January 01, 2008 1
# 2 January 02, 2008 1
# 3 January 03, 2008 1
# 4 January 04, 2008 1
# 5 January 05, 2008 26
# 6 January 06, 2008 1
# 7 January 07, 2008 1
# 8 January 08, 2008 1
# 9 January 09, 2008 1
#10 January 10, 2008 1
# … with 4,465 more rows

data

df <- structure(list(Date = c("September 16, 2012", "September 19, 2014", 
"January 05, 2008", "June 07, 2017", "December 15, 2019", "May 28, 2020"
), Val = c(32L, 33L, 26L, 2L, 3L, 18L)), class = "data.frame",
row.names = c(NA, -6L))

Adding missing dates to dataframe

Some test data (I am using Date, yours seems to be a different type, but this does not affect the algorithm):

data = data.frame(dates = as.Date(c("2011-12-15", "2011-12-17", "2011-12-19")), 
values = as.double(1:3))

# Generate **all** timestamps at which you want to have your result.
# I use `seq`, but you may use any other method of generating those timestamps.

alldates = seq(min(data$dates), max(data$dates), 1)

# Filter out timestamps that are already present in your `data.frame`:
# Construct a `data.frame` to append with missing values:
dates0 = alldates[!(alldates %in% data$dates)]
data0 = data.frame(dates = dates0, values = NA_real_)

# Append this `data.frame` and resort in time:
data = rbind(data, data0)
data = data[order(data$dates),]

# forward fill the values
# I would recommend to move this code into a separate `ffill` function:
# proved to be very useful in general):
current = NA_real_
data$values = sapply(data$values, function(x) {
current <<- ifelse(is.na(x), current, x); current })

Fill in missing dates for a pandas dataframe with multiple series

Group by Item and Category, then generate a time series from the min to the max date:

result = (
df.groupby(["Item", "Category"])["Date"]
.apply(lambda s: pd.date_range(s.min(), s.max()))
.explode()
.reset_index()
)

Insert and fill rows of missing dates with NA in List R

As vitor pointed out above, you can only join two data.frames, not a data.frame and a vector. dplyr also plays nice with POSIXct, but not POSIXlt (Hadley has a preference), so if you store your data as actual time, it will be easier to join usefully.

Further, within lapply, you need to use the variable of the function you create (x here), or you'll just be repeating the same thing. Don't subset the data.frames, either, if you want to join them; you need a column in each with the same name and data type.

All together, you need something like:

library(dplyr)

df$datetime <- as.POSIXct(df$datetime, tz = "GMT")
df <- tbl_df(df) # not necessary, but prints nicely

list_df <- list(df, df) # fake list of data.frames
# make a data.frame of sequence to join on
seq_df <- data_frame(datetime = seq.POSIXt(as.POSIXct("2015-04-01 0:00:00", tz = 'GMT'),
as.POSIXct("2015-11-30 23:59:59", tz = 'GMT'),
by="hour",tz="GMT"))

lapply(list_df, function(x){full_join(x, seq_df)})
# Joining by: "datetime"
# Joining by: "datetime"
# [[1]]
# Source: local data frame [5,857 x 3]
#
# datetime precip code
# (POSI) (lgl) (fctr)
# 1 2015-04-15 00:00:00 NA M
# 2 2015-04-15 01:00:00 NA M
# 3 2015-04-15 02:00:00 NA M
# 4 2015-04-15 03:00:00 NA M
# 5 2015-04-15 04:00:00 NA M
# 6 2015-04-15 05:00:00 NA M
# 7 2015-04-01 04:00:00 NA NA
# 8 2015-04-01 05:00:00 NA NA
# 9 2015-04-01 06:00:00 NA NA
# 10 2015-04-01 07:00:00 NA NA
# .. ... ... ...
#
# [[2]]
# Source: local data frame [5,857 x 3]
#
# datetime precip code
# (POSI) (lgl) (fctr)
# 1 2015-04-15 00:00:00 NA M
# 2 2015-04-15 01:00:00 NA M
# 3 2015-04-15 02:00:00 NA M
# 4 2015-04-15 03:00:00 NA M
# 5 2015-04-15 04:00:00 NA M
# 6 2015-04-15 05:00:00 NA M
# 7 2015-04-01 04:00:00 NA NA
# 8 2015-04-01 05:00:00 NA NA
# 9 2015-04-01 06:00:00 NA NA
# 10 2015-04-01 07:00:00 NA NA
# .. ... ... ...

Data:

df <- structure(list(datetime = structure(c(1429056000, 1429059600, 1429063200, 1429066800, 
1429070400, 1429074000), class = c("POSIXct", "POSIXt"), tzone = "GMT"), precip = c(NA,
NA, NA, NA, NA, NA), code = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = "M",
class = "factor")), .Names = c("datetime", "precip", "code"), row.names = c("1",
"2", "3", "4", "5", "6"), class = c("tbl_df", "tbl", "data.frame"))


Related Topics



Leave a reply



Submit