Creating Regular 15-Minute Time-Series from Irregular Time-Series

Creating regular 15-minute time-series from irregular time-series

xts extends zoo, and zoo has extensive examples for this in its vignettes and documentation.

Here is a worked example. I think I have done that more elegantly in the past, but this is all I am coming up with now:

R> twohours <- ISOdatetime(2012,05,02,9,0,0) + seq(0:7)*15*60
R> twohours
[1] "2012-05-02 09:15:00 GMT" "2012-05-02 09:30:00 GMT" 
[3] "2012-05-02 09:45:00 GMT" "2012-05-02 10:00:00 GMT" 
[5] "2012-05-02 10:15:00 GMT" "2012-05-02 10:30:00 GMT" 
[7] "2012-05-02 10:45:00 GMT" "2012-05-02 11:00:00 GMT"
R> set.seed(42)
R> observation <- xts(1:10, order.by=twohours[1]+cumsum(runif(10)*60*10))
R> observation
                           [,1]
2012-05-02 09:24:08.883625    1
2012-05-02 09:33:31.128874    2
2012-05-02 09:36:22.812594    3
2012-05-02 09:44:41.081170    4
2012-05-02 09:51:06.128481    5
2012-05-02 09:56:17.586051    6
2012-05-02 10:03:39.539040    7
2012-05-02 10:05:00.338998    8
2012-05-02 10:11:34.534372    9
2012-05-02 10:18:37.573243   10

A two hour time grid, and some random observations leaving some cells empty and some
filled.

R> to.minutes15(observation)[,4]
                           observation.Close
2012-05-02 09:24:08.883625                 1
2012-05-02 09:44:41.081170                 4
2012-05-02 09:56:17.586051                 6
2012-05-02 10:11:34.534372                 9
2012-05-02 10:18:37.573243                10

That is a 15 minutes grid aggregation but not on our time grid.

R> twoh <- xts(rep(NA,8), order.by=twohours)
R> twoh
                    [,1]
2012-05-02 09:15:00   NA
2012-05-02 09:30:00   NA
2012-05-02 09:45:00   NA
2012-05-02 10:00:00   NA
2012-05-02 10:15:00   NA
2012-05-02 10:30:00   NA
2012-05-02 10:45:00   NA
2012-05-02 11:00:00   NA

R> merge(twoh, observation)
                           twoh observation
2012-05-02 09:15:00.000000   NA          NA
2012-05-02 09:24:08.883625   NA           1
2012-05-02 09:30:00.000000   NA          NA
2012-05-02 09:33:31.128874   NA           2
2012-05-02 09:36:22.812594   NA           3
2012-05-02 09:44:41.081170   NA           4
2012-05-02 09:45:00.000000   NA          NA
2012-05-02 09:51:06.128481   NA           5
2012-05-02 09:56:17.586051   NA           6
2012-05-02 10:00:00.000000   NA          NA
2012-05-02 10:03:39.539040   NA           7
2012-05-02 10:05:00.338998   NA           8
2012-05-02 10:11:34.534372   NA           9
2012-05-02 10:15:00.000000   NA          NA
2012-05-02 10:18:37.573243   NA          10
2012-05-02 10:30:00.000000   NA          NA
2012-05-02 10:45:00.000000   NA          NA
2012-05-02 11:00:00.000000   NA          NA

New xts object, and merged object. Now use na.locf() to carry the observations
forward:

R> na.locf(merge(twoh, observation)[,2])
                           observation
2012-05-02 09:15:00.000000          NA
2012-05-02 09:24:08.883625           1
2012-05-02 09:30:00.000000           1
2012-05-02 09:33:31.128874           2
2012-05-02 09:36:22.812594           3
2012-05-02 09:44:41.081170           4
2012-05-02 09:45:00.000000           4
2012-05-02 09:51:06.128481           5
2012-05-02 09:56:17.586051           6
2012-05-02 10:00:00.000000           6
2012-05-02 10:03:39.539040           7
2012-05-02 10:05:00.338998           8
2012-05-02 10:11:34.534372           9
2012-05-02 10:15:00.000000           9
2012-05-02 10:18:37.573243          10
2012-05-02 10:30:00.000000          10
2012-05-02 10:45:00.000000          10
2012-05-02 11:00:00.000000          10

And then we can merge again as an inner join on the time-grid xts twoh:

R> merge(twoh, na.locf(merge(twoh, observation)[,2]), join="inner")[,2]
                    observation
2012-05-02 09:15:00          NA
2012-05-02 09:30:00           1
2012-05-02 09:45:00           4
2012-05-02 10:00:00           6
2012-05-02 10:15:00           9
2012-05-02 10:30:00          10
2012-05-02 10:45:00          10
2012-05-02 11:00:00          10
R>

converting values with irregular intervals into minute values with excel

Because I got Excel 2007, I do not have available the function MAXIFS function, and I'm forced to use an array formula, but I'm pretty sure this can be done easily with MAXIFS.

My data set is like this:

Sample Image

The formula I've used in E1 (and dragged down) is an array formula:

=INDEX($B$1:$B$4;MATCH(MAX(SI($A$1:$A$4<=E1;$A$1:$A$4));$A$1:$A$4;0))

Because it's an array formula, it must be inserted pressing
CTRL+SHIFT+ENTER, instead of only
pressing ENTER

This is how it works:

MAX(SI($A$1:$A$4<=E1;$A$1:$A$4)) will get the max date value that is <= than date in column E
We combine the value from step 1 with a MATCH to get the row number of that specific date
We combine value from step 2 with an INDEX to get the target value assigned to that specific date, and return it.

Anyways, as I said, I'm pretty sure you could get this easily with a MAXIFS to get the date, and then combine it with MATCH and INDEX.

Hope this helps.

NOTE: Because you are working with dates with minutes, and dates and times are numbers in Excel, actually you are working with decimal numbers (dates are integers and times are decimals). You must ensure that your values in both datasets are exactly the same, or it won't work!.

Let's see in 2 cells you got 2 values:

42078,5874998264
42078,5875000000

IF you apply a format like dd/mm/yyyy h:mm AM/PM, you will get this:

Sample Image

They look the same value but that's not true. Values of both cells are different, and that can make formulas show the wrong results. So you have to be 100% secure your dates in both datasets are exactly the same, or it won't work!

Convert multiple irregular time series into regular time series

If DF is your data frame then try this. Converting to ts in the last line makes it regular and then we convert back to zoo:

library(zoo)
z <- read.zoo(DF, split = 1, index = 2, format = "%d-%m-%Y")
z.ym <- aggregate(z, as.yearmon, identity) # convert to yearmon
zm <- aggregate(as.zoo(as.ts(z.ym)), as.yearmon, identity)

An alternative to the last line would be these two lines:

g <- zoo(, seq(start(z.ym), end(z.ym), deltat(z.ym))) # grid
zm <- merge(z.ym, g)

In either case, at this point coredata(zm) is the data part and time(zm) is the index although you might want to keep it as a zoo object so that you can use its other time series facilities and the many other packages which accept time series of that form.

Note: Here is a complete self-contained reproducible example:

DF <- structure(list(station = c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 
2L), Time = structure(c(1L, 2L, 3L, 5L, 7L, 1L, 2L, 4L, 6L, 8L
), .Label = c("01-01-1974", "01-02-1974", "01-03-1974", "01-04-1974", 
"01-05-1974", "01-06-1974", "01-07-1974", "01-08-1974"), class = "factor"), 
    WaterTemp = c(5, 5, 8.6000004, 8.1333332, 12.7999999, 5, 
    5, 8.6000004, 8.1333332, 12.7999999)), .Names = c("station", 
"Time", "WaterTemp"), class = "data.frame", row.names = c(NA, 
-10L))

library(zoo)
z <- read.zoo(DF, split = 1, index = 2, format = "%d-%m-%Y")
z.ym <- aggregate(z, as.yearmon, identity) # convert to yearmon
zm <- aggregate(as.zoo(as.ts(z.ym)), as.yearmon, identity)

giving:

> zm
                 1         2
Jan 1974  5.000000  5.000000
Feb 1974  5.000000  5.000000
Mar 1974  8.600000        NA
Apr 1974        NA  8.600000
May 1974  8.133333        NA
Jun 1974        NA  8.133333
Jul 1974 12.800000        NA
Aug 1974        NA 12.800000

Updated Some corrections and improvements.

How to generate a regular panel from an irregular panel time series data

Can do:

library(tidyverse)

dt %>%
  mutate(timestamp = as.Date(timestamp)) %>%
  group_by(trackingid) %>%
  complete(timestamp = seq(min(timestamp), max(timestamp), by = "day"), fill = list(pageimp = 0))

Output:

# A tibble: 7 x 3
# Groups:   trackingid [1]
  trackingid          timestamp  pageimp
  <fct>               <date>       <dbl>
1 1470149111625446656 2018-06-17       8
2 1470149111625446656 2018-06-18       1
3 1470149111625446656 2018-06-19       3
4 1470149111625446656 2018-06-20       0
5 1470149111625446656 2018-06-21       4
6 1470149111625446656 2018-06-22       2
7 1470149111625446656 2018-06-23       3

Basically you group by trackingid, expand your data by day from the minimum to maximum timestamp, and make use of fill argument to populate anything missing with 0.

Creating Regular 15-Minute Time-Series from Irregular Time-Series