Insert rows for missing dates/times
I think the easiest thing ist to set Date first as already described, convert to zoo, and then just set a merge:
df$timestamp<-as.POSIXct(df$timestamp,format="%m/%d/%y %H:%M")
df1.zoo<-zoo(df[,-1],df[,1]) #set date to Index
df2 <- merge(df1.zoo,zoo(,seq(start(df1.zoo),end(df1.zoo),by="min")), all=TRUE)
Start and end are given from your df1 (original data) and you are setting by - e.g min - as you need for your example. all=TRUE sets all missing values at the missing dates to NAs.
Python pandas: insert rows for missing dates, time series in groupby dataframe
Use custom function with DataFrame.asfreq
in GroupBy.apply
and then reassign Index
by GroupBy.cumcount
:
df['date'] = pd.to_datetime(df['date'])
df = (df.set_index('date')
.groupby('Serial_no')
.apply(lambda x: x.asfreq('MS'))
.drop('Serial_no', axis=1))
df = df.reset_index()
df["Index"] = df.groupby("Serial_no").cumcount() + 1
print (df)
Serial_no date Index x y
0 1 2014-01-01 1 2.0 3.0
1 1 2014-02-01 2 NaN NaN
2 1 2014-03-01 3 3.0 3.0
3 1 2014-04-01 4 6.0 2.0
4 2 2011-03-01 1 5.1 1.3
5 2 2011-04-01 2 5.8 0.6
6 2 2011-05-01 3 6.5 -0.1
7 2 2011-06-01 4 NaN NaN
8 2 2011-07-01 5 3.0 5.0
9 3 2019-10-01 1 7.9 -1.5
10 3 2019-11-01 2 8.6 -2.2
11 3 2019-12-01 3 NaN NaN
12 3 2020-01-01 4 10.0 -3.6
13 3 2020-02-01 5 10.7 -4.3
14 3 2020-03-01 6 4.0 3.0
Alternative solution with DataFrame.reindex
:
df['date'] = pd.to_datetime(df['date'])
f = lambda x: x.reindex(pd.date_range(x.index.min(), x.index.max(), freq='MS', name='date'))
df = df.set_index('date').groupby('Serial_no').apply(f).drop('Serial_no', axis=1)
df = df.reset_index()
df["Index"] = df.groupby("Serial_no").cumcount() + 1
Insert row to fill in missing date in R
We can first convert the date to actual date object (date1
) by pasting "-01"
at the end, then using complete
we create a sequence of 1 month date objects for each Person
. We then use fill
to get Amount
equal to the one before it and to get data in the original form we remove "-01"
again from date1
.
library(dplyr)
library(tidyr)
df %>%
mutate(date1 = as.Date(paste0(date, "-01"))) %>%
group_by(Person) %>%
complete(date1 = seq(min(date1), max(date1), by = "1 month")) %>%
fill(Amount) %>%
mutate(date = sub("-01$", "", date1)) %>%
select(-date1)
# Person date Amount
# <fct> <chr> <int>
# 1 A 2019-01 900
# 2 A 2019-02 900
# 3 A 2019-03 600
# 4 A 2019-04 300
# 5 A 2019-05 0
# 6 B 2019-04 1200
# 7 B 2019-05 1200
# 8 B 2019-06 1200
# 9 B 2019-07 800
#10 B 2019-08 400
#11 B 2019-09 0
data
df <- structure(list(Person = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L,
2L), .Label = c("A", "B"), class = "factor"), date = structure(c(1L,
2L, 3L, 4L, 3L, 5L, 6L, 7L), .Label = c("2019-01", "2019-03",
"2019-04", "2019-05", "2019-07", "2019-08", "2019-09"), class = "factor"),
Amount = c(900L, 600L, 300L, 0L, 1200L, 800L, 400L, 0L)),
class = "data.frame", row.names = c(NA, -8L))
How to add only missing Dates in Dataframe
Here's a correction of your approach, in base R.
Replace max(t1$Date)
bySys.Date()
in your real application:
t2<-merge(data.frame(Date= as.Date(min(t1$Date):max(t1$Date),"1970-1-1")),
t1, by = "Date", all = TRUE)
t2[is.na(t2)] <- 0
# Date Val1 Val2
# 1 2018-04-01 125 0.05
# 2 2018-04-02 0 0.00
# 3 2018-04-03 458 2.99
# 4 2018-04-04 0 0.00
# 5 2018-04-05 354 1.25
data
t1 <- read.table(text="Date Val1 Val2
'2018-04-01' 125 0.05
'2018-04-03' 458 2.99
'2018-04-05' 354 1.25",h=T,strin=F)
t1$Date <- as.Date(df$Date)
Adding missing dates in time series data
Use tidyr::complete
:
library(dplyr)
df %>%
mutate(Date = as.Date(Date, "%B %d, %Y")) %>%
tidyr::complete(Date = seq(as.Date('2008-01-01'), as.Date('2020-03-31'),
by = 'day'), fill = list(Val = 1)) %>%
mutate(Date = format(Date, "%B %d, %Y"))
# A tibble: 4,475 x 2
# Date Val
# <chr> <dbl>
# 1 January 01, 2008 1
# 2 January 02, 2008 1
# 3 January 03, 2008 1
# 4 January 04, 2008 1
# 5 January 05, 2008 26
# 6 January 06, 2008 1
# 7 January 07, 2008 1
# 8 January 08, 2008 1
# 9 January 09, 2008 1
#10 January 10, 2008 1
# … with 4,465 more rows
data
df <- structure(list(Date = c("September 16, 2012", "September 19, 2014",
"January 05, 2008", "June 07, 2017", "December 15, 2019", "May 28, 2020"
), Val = c(32L, 33L, 26L, 2L, 3L, 18L)), class = "data.frame",
row.names = c(NA, -6L))
Adding missing dates to dataframe
Some test data (I am using Date
, yours seems to be a different type, but this does not affect the algorithm):
data = data.frame(dates = as.Date(c("2011-12-15", "2011-12-17", "2011-12-19")),
values = as.double(1:3))
# Generate **all** timestamps at which you want to have your result.
# I use `seq`, but you may use any other method of generating those timestamps.
alldates = seq(min(data$dates), max(data$dates), 1)
# Filter out timestamps that are already present in your `data.frame`:
# Construct a `data.frame` to append with missing values:
dates0 = alldates[!(alldates %in% data$dates)]
data0 = data.frame(dates = dates0, values = NA_real_)
# Append this `data.frame` and resort in time:
data = rbind(data, data0)
data = data[order(data$dates),]
# forward fill the values
# I would recommend to move this code into a separate `ffill` function:
# proved to be very useful in general):
current = NA_real_
data$values = sapply(data$values, function(x) {
current <<- ifelse(is.na(x), current, x); current })
Fill in missing dates for a pandas dataframe with multiple series
Group by Item
and Category
, then generate a time series from the min to the max date:
result = (
df.groupby(["Item", "Category"])["Date"]
.apply(lambda s: pd.date_range(s.min(), s.max()))
.explode()
.reset_index()
)
Insert and fill rows of missing dates with NA in List R
As vitor pointed out above, you can only join two data.frames, not a data.frame and a vector. dplyr
also plays nice with POSIXct
, but not POSIXlt
(Hadley has a preference), so if you store your data as actual time, it will be easier to join usefully.
Further, within lapply
, you need to use the variable of the function you create (x
here), or you'll just be repeating the same thing. Don't subset the data.frames, either, if you want to join them; you need a column in each with the same name and data type.
All together, you need something like:
library(dplyr)
df$datetime <- as.POSIXct(df$datetime, tz = "GMT")
df <- tbl_df(df) # not necessary, but prints nicely
list_df <- list(df, df) # fake list of data.frames
# make a data.frame of sequence to join on
seq_df <- data_frame(datetime = seq.POSIXt(as.POSIXct("2015-04-01 0:00:00", tz = 'GMT'),
as.POSIXct("2015-11-30 23:59:59", tz = 'GMT'),
by="hour",tz="GMT"))
lapply(list_df, function(x){full_join(x, seq_df)})
# Joining by: "datetime"
# Joining by: "datetime"
# [[1]]
# Source: local data frame [5,857 x 3]
#
# datetime precip code
# (POSI) (lgl) (fctr)
# 1 2015-04-15 00:00:00 NA M
# 2 2015-04-15 01:00:00 NA M
# 3 2015-04-15 02:00:00 NA M
# 4 2015-04-15 03:00:00 NA M
# 5 2015-04-15 04:00:00 NA M
# 6 2015-04-15 05:00:00 NA M
# 7 2015-04-01 04:00:00 NA NA
# 8 2015-04-01 05:00:00 NA NA
# 9 2015-04-01 06:00:00 NA NA
# 10 2015-04-01 07:00:00 NA NA
# .. ... ... ...
#
# [[2]]
# Source: local data frame [5,857 x 3]
#
# datetime precip code
# (POSI) (lgl) (fctr)
# 1 2015-04-15 00:00:00 NA M
# 2 2015-04-15 01:00:00 NA M
# 3 2015-04-15 02:00:00 NA M
# 4 2015-04-15 03:00:00 NA M
# 5 2015-04-15 04:00:00 NA M
# 6 2015-04-15 05:00:00 NA M
# 7 2015-04-01 04:00:00 NA NA
# 8 2015-04-01 05:00:00 NA NA
# 9 2015-04-01 06:00:00 NA NA
# 10 2015-04-01 07:00:00 NA NA
# .. ... ... ...
Data:
df <- structure(list(datetime = structure(c(1429056000, 1429059600, 1429063200, 1429066800,
1429070400, 1429074000), class = c("POSIXct", "POSIXt"), tzone = "GMT"), precip = c(NA,
NA, NA, NA, NA, NA), code = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = "M",
class = "factor")), .Names = c("datetime", "precip", "code"), row.names = c("1",
"2", "3", "4", "5", "6"), class = c("tbl_df", "tbl", "data.frame"))
Related Topics
How to Remove Rows With Any Zero Value
Remove Specific Characters from Column Names in R
How to Get Rowsums for Selected Columns in R
Adding a New Column Based Upon Values in Another Column Using Dplyr
Loop Through Data Frame and Variable Names
Using Ifelse Statement on the Whole Dataset Instead of a Single Column
Remove Unwanted Symbols from Expression Function - R
Creating a New Column Based on Unique Id With Values in R
How to Remove Na from a Factor Variable (And from a Ggplot Chart)
R - Test If a String Vector Contains Any Element of Another List
Selecting Only Duplicates Based on Multiple Columns in R
Convert Multiple Columns of Numeric Data to Dates in R
How to Append a Sequential Number for Every Element in a Data Frame
How to Create a Consecutive Group Number