Create a Time Interval of 15 Minutes from Minutely Data in R

Create a time interval of 15 minutes from minutely data in R?

For data that's in POSIXct format, you can use the cut function to create 15-minute groupings, and then aggregate by those groups. The code below shows how to do this in base R and with the dplyr and data.table packages.

First, create some fake data:

set.seed(4984)
dat = data.frame(time=seq(as.POSIXct("2016-05-01"), as.POSIXct("2016-05-01") + 60*99, by=60),
count=sample(1:50, 100, replace=TRUE))

Base R

cut the data into 15 minute groups:

dat$by15 = cut(dat$time, breaks="15 min")
                   time count                by15
1 2016-05-01 00:00:00 22 2016-05-01 00:00:00
2 2016-05-01 00:01:00 11 2016-05-01 00:00:00
3 2016-05-01 00:02:00 31 2016-05-01 00:00:00
...
98 2016-05-01 01:37:00 20 2016-05-01 01:30:00
99 2016-05-01 01:38:00 29 2016-05-01 01:30:00
100 2016-05-01 01:39:00 37 2016-05-01 01:30:00

Now aggregate by the new grouping column, using sum as the aggregation function:

dat.summary = aggregate(count ~ by15, FUN=sum, data=dat)
                 by15 count
1 2016-05-01 00:00:00 312
2 2016-05-01 00:15:00 395
3 2016-05-01 00:30:00 341
4 2016-05-01 00:45:00 318
5 2016-05-01 01:00:00 349
6 2016-05-01 01:15:00 397
7 2016-05-01 01:30:00 341

dplyr

library(dplyr)

dat.summary = dat %>% group_by(by15=cut(time, "15 min")) %>%
summarise(count=sum(count))

data.table

library(data.table)

dat.summary = setDT(dat)[ , list(count=sum(count)), by=cut(time, "15 min")]

UPDATE: To answer the comment, for this case the end point of each grouping interval is as.POSIXct(as.character(dat$by15)) + 60*15 - 1. In other words, the endpoint of the grouping interval is 15 minutes minus one second from the start of the interval. We add 60*15 - 1 because POSIXct is denominated in seconds. The as.POSIXct(as.character(...)) is because cut returns a factor and this just converts it back to date-time so that we can do math on it.

If you want the end point to the nearest minute before the next interval (instead of the nearest second), you could to as.POSIXct(as.character(dat$by15)) + 60*14.

If you don't know the break interval, for example, because you chose the number of breaks and let R pick the interval, you could find the number of seconds to add by doing max(unique(diff(as.POSIXct(as.character(dat$by15))))) - 1.

Create a time series with a row every 15 minutes

Little bit easier to read

library(lubridate)
seq(ymd_hm('2015-01-01 00:00'),ymd_hm('2016-12-31 23:45'), by = '15 mins')

Creating regular 15-minute time-series from irregular time-series

xts extends zoo, and zoo has extensive examples for this in its vignettes and documentation.

Here is a worked example. I think I have done that more elegantly in the past, but this is all I am coming up with now:

R> twohours <- ISOdatetime(2012,05,02,9,0,0) + seq(0:7)*15*60
R> twohours
[1] "2012-05-02 09:15:00 GMT" "2012-05-02 09:30:00 GMT"
[3] "2012-05-02 09:45:00 GMT" "2012-05-02 10:00:00 GMT"
[5] "2012-05-02 10:15:00 GMT" "2012-05-02 10:30:00 GMT"
[7] "2012-05-02 10:45:00 GMT" "2012-05-02 11:00:00 GMT"
R> set.seed(42)
R> observation <- xts(1:10, order.by=twohours[1]+cumsum(runif(10)*60*10))
R> observation
[,1]
2012-05-02 09:24:08.883625 1
2012-05-02 09:33:31.128874 2
2012-05-02 09:36:22.812594 3
2012-05-02 09:44:41.081170 4
2012-05-02 09:51:06.128481 5
2012-05-02 09:56:17.586051 6
2012-05-02 10:03:39.539040 7
2012-05-02 10:05:00.338998 8
2012-05-02 10:11:34.534372 9
2012-05-02 10:18:37.573243 10

A two hour time grid, and some random observations leaving some cells empty and some
filled.

R> to.minutes15(observation)[,4]
observation.Close
2012-05-02 09:24:08.883625 1
2012-05-02 09:44:41.081170 4
2012-05-02 09:56:17.586051 6
2012-05-02 10:11:34.534372 9
2012-05-02 10:18:37.573243 10

That is a 15 minutes grid aggregation but not on our time grid.

R> twoh <- xts(rep(NA,8), order.by=twohours)
R> twoh
[,1]
2012-05-02 09:15:00 NA
2012-05-02 09:30:00 NA
2012-05-02 09:45:00 NA
2012-05-02 10:00:00 NA
2012-05-02 10:15:00 NA
2012-05-02 10:30:00 NA
2012-05-02 10:45:00 NA
2012-05-02 11:00:00 NA

R> merge(twoh, observation)
twoh observation
2012-05-02 09:15:00.000000 NA NA
2012-05-02 09:24:08.883625 NA 1
2012-05-02 09:30:00.000000 NA NA
2012-05-02 09:33:31.128874 NA 2
2012-05-02 09:36:22.812594 NA 3
2012-05-02 09:44:41.081170 NA 4
2012-05-02 09:45:00.000000 NA NA
2012-05-02 09:51:06.128481 NA 5
2012-05-02 09:56:17.586051 NA 6
2012-05-02 10:00:00.000000 NA NA
2012-05-02 10:03:39.539040 NA 7
2012-05-02 10:05:00.338998 NA 8
2012-05-02 10:11:34.534372 NA 9
2012-05-02 10:15:00.000000 NA NA
2012-05-02 10:18:37.573243 NA 10
2012-05-02 10:30:00.000000 NA NA
2012-05-02 10:45:00.000000 NA NA
2012-05-02 11:00:00.000000 NA NA

New xts object, and merged object. Now use na.locf() to carry the observations
forward:

R> na.locf(merge(twoh, observation)[,2])
observation
2012-05-02 09:15:00.000000 NA
2012-05-02 09:24:08.883625 1
2012-05-02 09:30:00.000000 1
2012-05-02 09:33:31.128874 2
2012-05-02 09:36:22.812594 3
2012-05-02 09:44:41.081170 4
2012-05-02 09:45:00.000000 4
2012-05-02 09:51:06.128481 5
2012-05-02 09:56:17.586051 6
2012-05-02 10:00:00.000000 6
2012-05-02 10:03:39.539040 7
2012-05-02 10:05:00.338998 8
2012-05-02 10:11:34.534372 9
2012-05-02 10:15:00.000000 9
2012-05-02 10:18:37.573243 10
2012-05-02 10:30:00.000000 10
2012-05-02 10:45:00.000000 10
2012-05-02 11:00:00.000000 10

And then we can merge again as an inner join on the time-grid xts twoh:

R> merge(twoh, na.locf(merge(twoh, observation)[,2]), join="inner")[,2]
observation
2012-05-02 09:15:00 NA
2012-05-02 09:30:00 1
2012-05-02 09:45:00 4
2012-05-02 10:00:00 6
2012-05-02 10:15:00 9
2012-05-02 10:30:00 10
2012-05-02 10:45:00 10
2012-05-02 11:00:00 10
R>

How to extract time interval data from minute data in r

You can extract rows with a minute-mark ending in 0 or 5 with

df[substr(format(df$Time, '%M'), 2, 2) %in% c(0, 5),]
# or
df[as.numeric(format(df$Time, '%M')) %% 5 == 0,]
# or
df[grep('[0|5]$', format(df$Time, '%M')),]

With filter:

library(dplyr)
df %>%
filter(substr(format(df$Time, '%M'), 2, 2) %in% c(0, 5))

# or

df %>%
filter(as.numeric(format(df$Time, '%M')) %% 5 == 0)

How to create a time interval that count the rows in such time interval in R

Since there is not a reproducible example, I attempt the solution on a simulated data frame. First we create a log of calls with ID and time:

library(lubridate)
library(dplyr)
library(magrittr)
set.seed(123)

# Generate 100 random call times during a day
calls.df <- data.frame(id=seq(1,100,1), calltime=sample(seq(as.POSIXct('2019/10/01'),
as.POSIXct('2019/10/02'), by="min"), 100))

There may not be all intervals represented in your call data so generate a sequence of all 30 minute bins in case:

full.df <- data.frame(bin=seq(as.POSIXct('2019/10/01'), as.POSIXct('2019/10/02'), by="30 min"))

Next tally up counts of calls in represented bins:

calls.df %>% arrange(calltime) %>% mutate(diff=interval(lag(calltime),calltime)) %>% 
mutate(mins=diff@.Data/60) %>% select(-diff) %>%
mutate(bin=floor_date(calltime, unit="30 minutes")) %>%
group_by(bin) %>% tally() -> orig.counts

Now make sure there are zeroes for unrepresented bins:

right_join(orig.counts,full.df,by="bin") %>% mutate(count=ifelse(is.na(n), 0, n))

# A tibble: 49 x 3
bin n count
<dttm> <int> <dbl>
1 2019-10-01 00:00:00 2 2
2 2019-10-01 00:30:00 1 1
3 2019-10-01 01:00:00 2 2
4 2019-10-01 01:30:00 NA 0
5 2019-10-01 02:00:00 2 2
6 2019-10-01 02:30:00 4 4
7 2019-10-01 03:00:00 1 1
8 2019-10-01 03:30:00 1 1
9 2019-10-01 04:00:00 2 2
10 2019-10-01 04:30:00 1 1
# ... with 39 more rows

Hope this is helpful for you.

Create a 24 hour vector with 60 and 1 minutes time interval in R

Quite a few things have to be done to get the summaries data. One can use dplyr, tidyr and lubridate packages to transform data.

The approach:

  1. Create DateTime column by uniting date and hour and converting to
    ymd_hms
  2. Group on src_addres, dest_address, and Year-Month-Day Hour to
    calculate hourly occurrence
  3. Group on src_addres, dest_address, and Year-Month-Day Hour:Min to calculate > per min occurrence
  4. Group on src_addres, dest_address and summarize to get max of hourly and per min occurrence
library(dplyr)
library(tidyr)
library(lubridate)

df %>% unite("DateTime", c("date","hour"), sep=" ") %>%
mutate(DateTime = ymd_hms(DateTime)) %>%
group_by(src_addres, dest_address, YMD_H = format(DateTime, "%Y-%m-%d %H")) %>%
mutate(HourlyAppearance = n()) %>%
group_by(src_addres, dest_address, YMD_HM = format(DateTime, "%Y-%m-%d %H:%M")) %>%
mutate(PerMinAppearance = n()) %>%
group_by(src_addres, dest_address) %>%
summarise( 'max(per hour)' = max(HourlyAppearance),
'max(per min)' = max(PerMinAppearance)) %>%
as.data.frame()

# src_addres dest_address max(per hour) max(per min)
# 1 1.11.201.19 172.16.16.100 1 1
# 2 1.119.43.90 172.16.16.100 1 1
# 3 1.119.43.90 172.16.16.153 1 1
# 4 1.119.43.90 192.168.1.112 1 1
# 5 1.171.43.133 172.16.16.5 2 2
# 6 1.179.191.82 172.16.16.5 1 1
# 7 1.179.191.82 192.168.1.111 1 1
# 8 1.179.191.82 192.168.1.112 1 1
# 9 1.180.72.186 172.16.16.153 1 1
# 10 1.202.165.40 172.16.16.153 1 1
# 11 1.203.84.52 172.16.16.5 1 1
# 12 1.203.84.52 192.168.1.112 1 1
# 13 1.209.171.4 192.168.1.111 1 1
# 14 1.214.34.114 172.16.16.100 1 1
# 15 1.214.34.114 172.16.16.153 1 1
# 16 1.214.34.114 172.16.16.5 1 1
# 17 1.214.34.114 192.168.1.111 1 1
# 18 1.214.34.114 192.168.1.112 1 1
# 19 1.55.249.92 172.16.16.153 1 1
# 20 1.55.249.92 172.16.16.5 1 1
# 21 1.71.188.254 172.16.16.100 1 1
# 22 1.71.188.254 172.16.16.153 1 1
# 23 1.71.188.254 172.16.16.159 1 1
# 24 1.71.188.254 172.16.16.5 1 1
# 25 1.71.188.254 192.168.1.111 1 1
# 26 1.71.188.254 192.168.1.112 1 1
# 27 1.85.18.88 172.16.16.153 1 1

Data:

OP hasn't provided data in a pretty simple format. The inclusion of date and time columns has made it more difficult. Perhaps that be reason for low response to this question. I preferred to read date and time part separately and then unite those to get Date/Time.

strtext <- "Sl  date hour  src_addres  dest_address  Date_t   Time_t
1996 2018-04-14 08:24:01 1.11.201.19 172.16.16.100 2018-04-14 08:24:01
3702 2018-04-15 12:10:27 1.119.43.90 172.16.16.100 2018-04-15 12:10:27
1154 2018-04-14 00:59:27 1.119.43.90 172.16.16.153 2018-04-14 00:59:27
2414 2018-04-14 12:33:29 1.119.43.90 192.168.1.112 2018-04-14 12:33:29
18013 2018-04-28 18:49:05 1.171.43.133 172.16.16.5 2018-04-28 18:49:05
18015 2018-04-28 18:49:05 1.171.43.133 172.16.16.5 2018-04-28 18:49:05
6903 2018-04-25 21:31:52 1.179.191.82 172.16.16.5 2018-04-25 21:31:52
11741 2018-04-27 01:08:43 1.179.191.82 192.168.1.111 2018-04-27 01:08:43
11933 2018-04-27 02:00:10 1.179.191.82 192.168.1.111 2018-04-27 02:00:10
11023 2018-04-26 21:39:39 1.179.191.82 192.168.1.112 2018-04-26 21:39:39
11175 2018-04-26 22:31:01 1.179.191.82 192.168.1.112 2018-04-26 22:31:01
13073 2018-04-27 08:24:58 1.180.72.186 172.16.16.153 2018-04-27 08:24:58
13735 2018-04-27 12:07:34 1.180.72.186 172.16.16.153 2018-04-27 12:07:34
2752 2018-04-14 19:34:53 1.202.165.40 172.16.16.153 2018-04-14 19:34:53
4046 2018-04-15 18:16:40 1.203.84.52 172.16.16.5 2018-04-15 18:16:40
4048 2018-04-15 18:18:43 1.203.84.52 192.168.1.112 2018-04-15 18:18:43
3020 2018-04-15 01:35:40 1.209.171.4 192.168.1.111 2018-04-15 01:35:40
4870 2018-04-16 05:33:42 1.214.34.114 172.16.16.100 2018-04-16 05:33:42
7025 2018-04-25 22:28:06 1.214.34.114 172.16.16.100 2018-04-25 22:28:06
4262 2018-04-15 23:31:56 1.214.34.114 172.16.16.153 2018-04-15 23:31:56
9369 2018-04-26 10:32:50 1.214.34.114 172.16.16.153 2018-04-26 10:32:50
2716 2018-04-14 18:49:30 1.214.34.114 172.16.16.5 2018-04-14 18:49:30
9563 2018-04-26 12:34:58 1.214.34.114 172.16.16.5 2018-04-26 12:34:58
1110 2018-04-14 00:27:02 1.214.34.114 192.168.1.111 2018-04-14 00:27:02
4470 2018-04-16 01:27:32 1.214.34.114 192.168.1.112 2018-04-16 01:27:32
9581 2018-04-26 12:55:39 1.55.249.92 172.16.16.153 2018-04-26 12:55:39
2970 2018-04-15 00:01:18 1.55.249.92 172.16.16.5 2018-04-15 00:01:18
15329 2018-04-27 21:53:16 1.55.249.92 172.16.16.5 2018-04-27 21:53:16
15537 2018-04-28 00:02:30 1.55.249.92 172.16.16.5 2018-04-28 00:02:30
19249 2018-04-29 06:28:04 1.71.188.254 172.16.16.100 2018-04-29 06:28:04
19243 2018-04-29 06:28:04 1.71.188.254 172.16.16.153 2018-04-29 06:28:04
19241 2018-04-29 06:28:04 1.71.188.254 172.16.16.159 2018-04-29 06:28:04
19239 2018-04-29 06:28:04 1.71.188.254 172.16.16.5 2018-04-29 06:28:04
19247 2018-04-29 06:28:04 1.71.188.254 192.168.1.111 2018-04-29 06:28:04
19245 2018-04-29 06:28:04 1.71.188.254 192.168.1.112 2018-04-29 06:28:04
6315 2018-04-25 18:56:08 1.85.18.88 172.16.16.153 2018-04-25 18:56:08
14623 2018-04-27 16:41:00 1.85.18.88 172.16.16.153 2018-04-27 16:41:00"

df <- read.table(text = strtext,header = TRUE, stringsAsFactors = FALSE)

5-minutes interval time series in R

Do you just want a sequence of dates at 5 min intervals, with a time zone? If so you can use seq.PosiXt as described here

See my example below:

start_date <- as.POSIXct(paste0("2010/01/01", "00:00:00"), format= "%Y/%m/%d %H:%M:%S", tz = "Etc/GMT+2")
end_date <- as.POSIXct(paste0("2018/01/01","00:00:01"), format= "%Y/%m/%d %H:%M:%S", tz = "Etc/GMT+2")


output <- seq.POSIXt(start_date, end_date, by = "5 min")
formatted <- format_iso_8601(output)


Related Topics



Leave a reply



Submit