Creating Regular 15-Minute Time-Series from Irregular Time-Series

Creating regular 15-minute time-series from irregular time-series

xts extends zoo, and zoo has extensive examples for this in its vignettes and documentation.

Here is a worked example. I think I have done that more elegantly in the past, but this is all I am coming up with now:

R> twohours <- ISOdatetime(2012,05,02,9,0,0) + seq(0:7)*15*60
R> twohours
[1] "2012-05-02 09:15:00 GMT" "2012-05-02 09:30:00 GMT"
[3] "2012-05-02 09:45:00 GMT" "2012-05-02 10:00:00 GMT"
[5] "2012-05-02 10:15:00 GMT" "2012-05-02 10:30:00 GMT"
[7] "2012-05-02 10:45:00 GMT" "2012-05-02 11:00:00 GMT"
R> set.seed(42)
R> observation <- xts(1:10, order.by=twohours[1]+cumsum(runif(10)*60*10))
R> observation
[,1]
2012-05-02 09:24:08.883625 1
2012-05-02 09:33:31.128874 2
2012-05-02 09:36:22.812594 3
2012-05-02 09:44:41.081170 4
2012-05-02 09:51:06.128481 5
2012-05-02 09:56:17.586051 6
2012-05-02 10:03:39.539040 7
2012-05-02 10:05:00.338998 8
2012-05-02 10:11:34.534372 9
2012-05-02 10:18:37.573243 10

A two hour time grid, and some random observations leaving some cells empty and some
filled.

R> to.minutes15(observation)[,4]
observation.Close
2012-05-02 09:24:08.883625 1
2012-05-02 09:44:41.081170 4
2012-05-02 09:56:17.586051 6
2012-05-02 10:11:34.534372 9
2012-05-02 10:18:37.573243 10

That is a 15 minutes grid aggregation but not on our time grid.

R> twoh <- xts(rep(NA,8), order.by=twohours)
R> twoh
[,1]
2012-05-02 09:15:00 NA
2012-05-02 09:30:00 NA
2012-05-02 09:45:00 NA
2012-05-02 10:00:00 NA
2012-05-02 10:15:00 NA
2012-05-02 10:30:00 NA
2012-05-02 10:45:00 NA
2012-05-02 11:00:00 NA

R> merge(twoh, observation)
twoh observation
2012-05-02 09:15:00.000000 NA NA
2012-05-02 09:24:08.883625 NA 1
2012-05-02 09:30:00.000000 NA NA
2012-05-02 09:33:31.128874 NA 2
2012-05-02 09:36:22.812594 NA 3
2012-05-02 09:44:41.081170 NA 4
2012-05-02 09:45:00.000000 NA NA
2012-05-02 09:51:06.128481 NA 5
2012-05-02 09:56:17.586051 NA 6
2012-05-02 10:00:00.000000 NA NA
2012-05-02 10:03:39.539040 NA 7
2012-05-02 10:05:00.338998 NA 8
2012-05-02 10:11:34.534372 NA 9
2012-05-02 10:15:00.000000 NA NA
2012-05-02 10:18:37.573243 NA 10
2012-05-02 10:30:00.000000 NA NA
2012-05-02 10:45:00.000000 NA NA
2012-05-02 11:00:00.000000 NA NA

New xts object, and merged object. Now use na.locf() to carry the observations
forward:

R> na.locf(merge(twoh, observation)[,2])
observation
2012-05-02 09:15:00.000000 NA
2012-05-02 09:24:08.883625 1
2012-05-02 09:30:00.000000 1
2012-05-02 09:33:31.128874 2
2012-05-02 09:36:22.812594 3
2012-05-02 09:44:41.081170 4
2012-05-02 09:45:00.000000 4
2012-05-02 09:51:06.128481 5
2012-05-02 09:56:17.586051 6
2012-05-02 10:00:00.000000 6
2012-05-02 10:03:39.539040 7
2012-05-02 10:05:00.338998 8
2012-05-02 10:11:34.534372 9
2012-05-02 10:15:00.000000 9
2012-05-02 10:18:37.573243 10
2012-05-02 10:30:00.000000 10
2012-05-02 10:45:00.000000 10
2012-05-02 11:00:00.000000 10

And then we can merge again as an inner join on the time-grid xts twoh:

R> merge(twoh, na.locf(merge(twoh, observation)[,2]), join="inner")[,2]
observation
2012-05-02 09:15:00 NA
2012-05-02 09:30:00 1
2012-05-02 09:45:00 4
2012-05-02 10:00:00 6
2012-05-02 10:15:00 9
2012-05-02 10:30:00 10
2012-05-02 10:45:00 10
2012-05-02 11:00:00 10
R>

converting values with irregular intervals into minute values with excel

Because I got Excel 2007, I do not have available the function MAXIFS function, and I'm forced to use an array formula, but I'm pretty sure this can be done easily with MAXIFS.

My data set is like this:

Sample Image

The formula I've used in E1 (and dragged down) is an array formula:

=INDEX($B$1:$B$4;MATCH(MAX(SI($A$1:$A$4<=E1;$A$1:$A$4));$A$1:$A$4;0))

Because it's an array formula, it must be inserted pressing
CTRL+SHIFT+ENTER, instead of only
pressing ENTER

This is how it works:

  1. MAX(SI($A$1:$A$4<=E1;$A$1:$A$4)) will get the max date value that is <= than date in column E
  2. We combine the value from step 1 with a MATCH to get the row number of that specific date
  3. We combine value from step 2 with an INDEX to get the target value assigned to that specific date, and return it.

Anyways, as I said, I'm pretty sure you could get this easily with a MAXIFS to get the date, and then combine it with MATCH and INDEX.

Hope this helps.

NOTE: Because you are working with dates with minutes, and dates and times are numbers in Excel, actually you are working with decimal numbers (dates are integers and times are decimals). You must ensure that your values in both datasets are exactly the same, or it won't work!.

Let's see in 2 cells you got 2 values:

42078,5874998264
42078,5875000000

IF you apply a format like dd/mm/yyyy h:mm AM/PM, you will get this:

Sample Image

They look the same value but that's not true. Values of both cells are different, and that can make formulas show the wrong results. So you have to be 100% secure your dates in both datasets are exactly the same, or it won't work!

Convert multiple irregular time series into regular time series

If DF is your data frame then try this. Converting to ts in the last line makes it regular and then we convert back to zoo:

library(zoo)
z <- read.zoo(DF, split = 1, index = 2, format = "%d-%m-%Y")
z.ym <- aggregate(z, as.yearmon, identity) # convert to yearmon
zm <- aggregate(as.zoo(as.ts(z.ym)), as.yearmon, identity)

An alternative to the last line would be these two lines:

g <- zoo(, seq(start(z.ym), end(z.ym), deltat(z.ym))) # grid
zm <- merge(z.ym, g)

In either case, at this point coredata(zm) is the data part and time(zm) is the index although you might want to keep it as a zoo object so that you can use its other time series facilities and the many other packages which accept time series of that form.

Note: Here is a complete self-contained reproducible example:

DF <- structure(list(station = c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 
2L), Time = structure(c(1L, 2L, 3L, 5L, 7L, 1L, 2L, 4L, 6L, 8L
), .Label = c("01-01-1974", "01-02-1974", "01-03-1974", "01-04-1974",
"01-05-1974", "01-06-1974", "01-07-1974", "01-08-1974"), class = "factor"),
WaterTemp = c(5, 5, 8.6000004, 8.1333332, 12.7999999, 5,
5, 8.6000004, 8.1333332, 12.7999999)), .Names = c("station",
"Time", "WaterTemp"), class = "data.frame", row.names = c(NA,
-10L))

library(zoo)
z <- read.zoo(DF, split = 1, index = 2, format = "%d-%m-%Y")
z.ym <- aggregate(z, as.yearmon, identity) # convert to yearmon
zm <- aggregate(as.zoo(as.ts(z.ym)), as.yearmon, identity)

giving:

> zm
1 2
Jan 1974 5.000000 5.000000
Feb 1974 5.000000 5.000000
Mar 1974 8.600000 NA
Apr 1974 NA 8.600000
May 1974 8.133333 NA
Jun 1974 NA 8.133333
Jul 1974 12.800000 NA
Aug 1974 NA 12.800000

Updated Some corrections and improvements.

How to generate a regular panel from an irregular panel time series data

Can do:

library(tidyverse)

dt %>%
mutate(timestamp = as.Date(timestamp)) %>%
group_by(trackingid) %>%
complete(timestamp = seq(min(timestamp), max(timestamp), by = "day"), fill = list(pageimp = 0))

Output:

# A tibble: 7 x 3
# Groups: trackingid [1]
trackingid timestamp pageimp
<fct> <date> <dbl>
1 1470149111625446656 2018-06-17 8
2 1470149111625446656 2018-06-18 1
3 1470149111625446656 2018-06-19 3
4 1470149111625446656 2018-06-20 0
5 1470149111625446656 2018-06-21 4
6 1470149111625446656 2018-06-22 2
7 1470149111625446656 2018-06-23 3

Basically you group by trackingid, expand your data by day from the minimum to maximum timestamp, and make use of fill argument to populate anything missing with 0.



Related Topics



Leave a reply



Submit