Binning Time Data in R

Binning time data in R

Just use ?cut as it has a method for ?cut.POSIXt date/times. E.g.:

x <- as.POSIXct("2016-01-01 00:00:00", tz="UTC") + as.difftime(30*(0:47),units="mins")
cut(x, breaks="2 hours", labels=FALSE)
# or to show more clearly the results:
data.frame(x, cuts = cut(x, breaks="2 hours", labels=FALSE))

# x cuts
#1 2016-01-01 00:00:00 1
#2 2016-01-01 00:30:00 1
#3 2016-01-01 01:00:00 1
#4 2016-01-01 01:30:00 1
#5 2016-01-01 02:00:00 2
#6 2016-01-01 02:30:00 2
#7 2016-01-01 03:00:00 2
#8 2016-01-01 03:30:00 2
#9 2016-01-01 04:00:00 3
#10 2016-01-01 04:30:00 3
# ...

If your data are just strings, then you need to do a conversion first. Times will end up assigned to the current day if you don't specify a particular day as well.

as.POSIXct("17:23:54", format="%H:%M:%S", tz="UTC")
#[1] "2016-07-13 17:23:54 UTC"

Binning data by time in R

You can use floor_date to round down the Time for each minute and take sum in each group.

library(dplyr)
library(lubridate)

df %>%
mutate(Time = ymd_hms(Time)) %>%
group_by(ID, Time = floor_date(Time, "1 min")) %>%
summarise(Data = sum(Data))

Create time bins and assign data to correct bin

I tried to solve this using data.table and lubridate and sticking to my idea of using floor_date.

# load packages
library(data.table)
library(lubridate)

# define a vector evenly spaced each 30 minutes:
b <- data.table(dates = seq(as.POSIXct("2018-03-25", tz = "UTC"),
as.POSIXct("2018-03-26", tz = "UTC"),
by = "30 min"))

# reproduce data
dt <- data.table(detect_date = as.character(c("25/03/2018 00:09", "25/03/2018 01:17", "25/03/2016 14:37", "25/03/2016 23:43")),
Station = c("SS01", "SS03", "SS04", "SS04"),
Individual = c("A", "B", "C", "B"))

# convert detect_date to date format
dt[, detect_date := dmy_hm(detect_date)]

# make a join
dt[, .(Location = Station, Individual), by = .(dates = floor_date(detect_date, "30 minutes"))][b, on = "dates"]

Binning time series in R?

While you could convert to a formal time representation, in this case it might be easier to just use substr:

test <- c("00:00:01","02:07:01","22:30:15")
as.numeric(substr(test,1,2))
#[1] 0 2 22

Using a POSIXct time to deal with it would also work, and might be handy if you plan on further calculations (differences in time etc):

testtime <- as.POSIXct(test,format="%H:%M:%S")
#[1]"2013-12-09 00:00:01 EST" "2013-12-09 02:07:01 EST" "2013-12-09 22:30:15 EST"
as.numeric(format(testtime,"%H"))
#[1] 0 2 22

Given time column, how can I create time bins in R?

One way to do this is to use strptime to format your time column as POSIX objects, and then use format on those objects to round down to the hour like so:

library(dplyr)

df$hour <- format(strptime(df$time, "%H:%M"), "%H:00")

df %>% group_by(hour) %>% summarize(respond = sum(respond))

# # A tibble: 3 x 2
# hour respond
# <chr> <int>
# 1 08:00 0
# 2 09:00 2
# 3 15:00 1

How to bin times from different days into time bins

If you want to bin by time-of-day, regardless of date, then it might be easier to extract just the time-of-day and work with that.

dat = data.frame(time=t, q=q)

library(lubridate)
library(plyr)

# Extract time of day from each date-time
dat$hour = hour(dat$time) + minute(dat$time)/60 + second(dat$time)/3600

# Create bin labels
bins=c(paste0(rep(c(paste0(0,0:9),10:23), each=4),":", c("00",15,30,45))[-1],"24:00")

# Bin the data
dat$bins = cut(dat$hour, breaks=seq(0, 24, 0.25), labels=bins)

And here's the result of summarizing by time bin:

ddply(dat, .(bins), summarise, q_sum = sum(q), .drop=FALSE)

bins q_sum
1 00:15 0
2 00:30 0
3 00:45 0
4 01:00 0
5 01:15 100
6 01:30 0
...
10 02:30 0
11 02:45 100
12 03:00 0
...
27 06:45 0
28 07:00 100
29 07:15 0
30 07:30 0
31 07:45 0
32 08:00 0
33 08:15 100
34 08:30 0
...
52 13:00 0
53 13:15 100
54 13:30 0
55 13:45 0
...
72 18:00 0
73 18:15 0
74 18:30 200
75 18:45 0
...
82 20:30 0
83 20:45 0
84 21:00 100
85 21:15 0
86 21:30 0
...
95 23:45 0
96 24:00 0

How to create time bins in R and group data

This routine can be implemented with {dplyr} group_by mutate and summarize. I split it up into two result objects res1 and res2

dat <- read.table(text="trial   event   time_start  time_end    time_duration   region
1 A 36403 36504 101 none
1 B 36506 36516 10 none
1 A 36518 36700 182 top
1 B 36702 36708 6 none
1 A 36710 37054 344 top
1 B 37056 37088 32 none
1 A 37090 37640 550 right
1 B 37642 37678 36 none
1 A 37680 37812 132 left
2 A 41278 41318 40 top
2 B 41320 41336 16 none
2 A 41338 41490 152 top
2 B 41492 41498 6 none
2 A 41500 41994 494 top
2 B 41996 42032 36 none
2 A 42034 42492 458 left", header=TRUE)

library(dplyr, warn.conflicts = FALSE)

res1 <- dat %>%
group_by(trial) %>%
mutate(duration = time_end - time_start,
total_duration = sum(duration),
cml_duration = cumsum(duration),
fractime = cml_duration / total_duration,
bin = floor(fractime / 0.25 + 0.99))
# 0.99 < 1 : fudge factor for group 1:4 not 0:4 or 1:5
res2 <- res1 %>%
group_by(trial, bin) %>%
summarize(total_event_a = sum(event == "A"), total_event_a_right = sum(event == "A" & region == "right"))
#> `summarise()` regrouping output by 'trial' (override with `.groups` argument)

res2
#> # A tibble: 6 x 4
#> # Groups: trial [2]
#> trial bin total_event_a total_event_a_right
#> <int> <dbl> <int> <int>
#> 1 1 1 2 0
#> 2 1 2 1 0
#> 3 1 4 2 1
#> 4 2 1 2 0
#> 5 2 3 1 0
#> 6 2 4 1 0

Created on 2020-12-06 by the reprex package (v0.3.0)

R - Split time series into time-only bins

Drop the date and deal only with the time component?

format(tt, "%H:%M:%S")

extracts the time component into a string, but it can be modified to further convert to any format your binning code handles. Alternatively, make the date the same prior to binning.



Related Topics



Leave a reply



Submit