Creating regular 15-minute time-series from irregular time-series
xts extends zoo, and zoo has extensive examples for this in its vignettes and documentation.
Here is a worked example. I think I have done that more elegantly in the past, but this is all I am coming up with now:
R> twohours <- ISOdatetime(2012,05,02,9,0,0) + seq(0:7)*15*60
R> twohours
[1] "2012-05-02 09:15:00 GMT" "2012-05-02 09:30:00 GMT"
[3] "2012-05-02 09:45:00 GMT" "2012-05-02 10:00:00 GMT"
[5] "2012-05-02 10:15:00 GMT" "2012-05-02 10:30:00 GMT"
[7] "2012-05-02 10:45:00 GMT" "2012-05-02 11:00:00 GMT"
R> set.seed(42)
R> observation <- xts(1:10, order.by=twohours[1]+cumsum(runif(10)*60*10))
R> observation
[,1]
2012-05-02 09:24:08.883625 1
2012-05-02 09:33:31.128874 2
2012-05-02 09:36:22.812594 3
2012-05-02 09:44:41.081170 4
2012-05-02 09:51:06.128481 5
2012-05-02 09:56:17.586051 6
2012-05-02 10:03:39.539040 7
2012-05-02 10:05:00.338998 8
2012-05-02 10:11:34.534372 9
2012-05-02 10:18:37.573243 10
A two hour time grid, and some random observations leaving some cells empty and some
filled.
R> to.minutes15(observation)[,4]
observation.Close
2012-05-02 09:24:08.883625 1
2012-05-02 09:44:41.081170 4
2012-05-02 09:56:17.586051 6
2012-05-02 10:11:34.534372 9
2012-05-02 10:18:37.573243 10
That is a 15 minutes grid aggregation but not on our time grid.
R> twoh <- xts(rep(NA,8), order.by=twohours)
R> twoh
[,1]
2012-05-02 09:15:00 NA
2012-05-02 09:30:00 NA
2012-05-02 09:45:00 NA
2012-05-02 10:00:00 NA
2012-05-02 10:15:00 NA
2012-05-02 10:30:00 NA
2012-05-02 10:45:00 NA
2012-05-02 11:00:00 NA
R> merge(twoh, observation)
twoh observation
2012-05-02 09:15:00.000000 NA NA
2012-05-02 09:24:08.883625 NA 1
2012-05-02 09:30:00.000000 NA NA
2012-05-02 09:33:31.128874 NA 2
2012-05-02 09:36:22.812594 NA 3
2012-05-02 09:44:41.081170 NA 4
2012-05-02 09:45:00.000000 NA NA
2012-05-02 09:51:06.128481 NA 5
2012-05-02 09:56:17.586051 NA 6
2012-05-02 10:00:00.000000 NA NA
2012-05-02 10:03:39.539040 NA 7
2012-05-02 10:05:00.338998 NA 8
2012-05-02 10:11:34.534372 NA 9
2012-05-02 10:15:00.000000 NA NA
2012-05-02 10:18:37.573243 NA 10
2012-05-02 10:30:00.000000 NA NA
2012-05-02 10:45:00.000000 NA NA
2012-05-02 11:00:00.000000 NA NA
New xts object, and merged object. Now use na.locf()
to carry the observations
forward:
R> na.locf(merge(twoh, observation)[,2])
observation
2012-05-02 09:15:00.000000 NA
2012-05-02 09:24:08.883625 1
2012-05-02 09:30:00.000000 1
2012-05-02 09:33:31.128874 2
2012-05-02 09:36:22.812594 3
2012-05-02 09:44:41.081170 4
2012-05-02 09:45:00.000000 4
2012-05-02 09:51:06.128481 5
2012-05-02 09:56:17.586051 6
2012-05-02 10:00:00.000000 6
2012-05-02 10:03:39.539040 7
2012-05-02 10:05:00.338998 8
2012-05-02 10:11:34.534372 9
2012-05-02 10:15:00.000000 9
2012-05-02 10:18:37.573243 10
2012-05-02 10:30:00.000000 10
2012-05-02 10:45:00.000000 10
2012-05-02 11:00:00.000000 10
And then we can merge again as an inner join on the time-grid xts twoh
:
R> merge(twoh, na.locf(merge(twoh, observation)[,2]), join="inner")[,2]
observation
2012-05-02 09:15:00 NA
2012-05-02 09:30:00 1
2012-05-02 09:45:00 4
2012-05-02 10:00:00 6
2012-05-02 10:15:00 9
2012-05-02 10:30:00 10
2012-05-02 10:45:00 10
2012-05-02 11:00:00 10
R>
converting values with irregular intervals into minute values with excel
Because I got Excel 2007, I do not have available the function MAXIFS function, and I'm forced to use an array formula, but I'm pretty sure this can be done easily with MAXIFS.
My data set is like this:
The formula I've used in E1 (and dragged down) is an array formula:
=INDEX($B$1:$B$4;MATCH(MAX(SI($A$1:$A$4<=E1;$A$1:$A$4));$A$1:$A$4;0))
Because it's an array formula, it must be inserted pressing
CTRL+SHIFT+ENTER, instead of only
pressing ENTER
This is how it works:
MAX(SI($A$1:$A$4<=E1;$A$1:$A$4))
will get the max date value that is <= than date in column E- We combine the value from step 1 with a
MATCH
to get the row number of that specific date - We combine value from step 2 with an
INDEX
to get the target value assigned to that specific date, and return it.
Anyways, as I said, I'm pretty sure you could get this easily with a MAXIFS to get the date, and then combine it with MATCH and INDEX.
Hope this helps.
NOTE: Because you are working with dates with minutes, and dates and times are numbers in Excel, actually you are working with decimal numbers (dates are integers and times are decimals). You must ensure that your values in both datasets are exactly the same, or it won't work!.
Let's see in 2 cells you got 2 values:
42078,5874998264
42078,5875000000
IF you apply a format like dd/mm/yyyy h:mm AM/PM
, you will get this:
They look the same value but that's not true. Values of both cells are different, and that can make formulas show the wrong results. So you have to be 100% secure your dates in both datasets are exactly the same, or it won't work!
Convert multiple irregular time series into regular time series
If DF
is your data frame then try this. Converting to ts
in the last line makes it regular and then we convert back to zoo:
library(zoo)
z <- read.zoo(DF, split = 1, index = 2, format = "%d-%m-%Y")
z.ym <- aggregate(z, as.yearmon, identity) # convert to yearmon
zm <- aggregate(as.zoo(as.ts(z.ym)), as.yearmon, identity)
An alternative to the last line would be these two lines:
g <- zoo(, seq(start(z.ym), end(z.ym), deltat(z.ym))) # grid
zm <- merge(z.ym, g)
In either case, at this point coredata(zm)
is the data part and time(zm)
is the index although you might want to keep it as a zoo object so that you can use its other time series facilities and the many other packages which accept time series of that form.
Note: Here is a complete self-contained reproducible example:
DF <- structure(list(station = c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L,
2L), Time = structure(c(1L, 2L, 3L, 5L, 7L, 1L, 2L, 4L, 6L, 8L
), .Label = c("01-01-1974", "01-02-1974", "01-03-1974", "01-04-1974",
"01-05-1974", "01-06-1974", "01-07-1974", "01-08-1974"), class = "factor"),
WaterTemp = c(5, 5, 8.6000004, 8.1333332, 12.7999999, 5,
5, 8.6000004, 8.1333332, 12.7999999)), .Names = c("station",
"Time", "WaterTemp"), class = "data.frame", row.names = c(NA,
-10L))
library(zoo)
z <- read.zoo(DF, split = 1, index = 2, format = "%d-%m-%Y")
z.ym <- aggregate(z, as.yearmon, identity) # convert to yearmon
zm <- aggregate(as.zoo(as.ts(z.ym)), as.yearmon, identity)
giving:
> zm
1 2
Jan 1974 5.000000 5.000000
Feb 1974 5.000000 5.000000
Mar 1974 8.600000 NA
Apr 1974 NA 8.600000
May 1974 8.133333 NA
Jun 1974 NA 8.133333
Jul 1974 12.800000 NA
Aug 1974 NA 12.800000
Updated Some corrections and improvements.
How to generate a regular panel from an irregular panel time series data
Can do:
library(tidyverse)
dt %>%
mutate(timestamp = as.Date(timestamp)) %>%
group_by(trackingid) %>%
complete(timestamp = seq(min(timestamp), max(timestamp), by = "day"), fill = list(pageimp = 0))
Output:
# A tibble: 7 x 3
# Groups: trackingid [1]
trackingid timestamp pageimp
<fct> <date> <dbl>
1 1470149111625446656 2018-06-17 8
2 1470149111625446656 2018-06-18 1
3 1470149111625446656 2018-06-19 3
4 1470149111625446656 2018-06-20 0
5 1470149111625446656 2018-06-21 4
6 1470149111625446656 2018-06-22 2
7 1470149111625446656 2018-06-23 3
Basically you group by trackingid
, expand your data by day from the minimum to maximum timestamp
, and make use of fill
argument to populate anything missing with 0.
Related Topics
Error in Installation a R Package
How to Write to JSON with Children from R
Dplyr - Using Mutate() Like Rowmeans()
How to Set the Default Language of Date in R
Update Handsontable by Editing Table And/Or Eventreactive
Convert a Date Vector into Julian Day in R
How to Read CSV File in R Where Some Values Contain the Percent Symbol (%)
Combine Several Data Frames in the Global Environment by Row (Rbind)
Ggplot Geom_Text Font Size Control
Superscript and Subscript Axis Labels in Ggplot2
Accept Http Request in R Shiny Application
How to Use Map from Purrr with Dplyr::Mutate to Create Multiple New Columns Based on Column Pairs
How to Delete Rows from a Data.Frame, Based on an External List, Using R
Plot.New Has Not Been Called Yet
Rolling Sum by Another Variable in R
How to Export S3 Method So It Is Available in Namespace
Removing One Tablegrob When Applied to a Box Plot with a Facet_Wrap