Split Time Series Data into Time Intervals (Say an Hour) and Then Plot the Count

Split time series data into time intervals (say an hour) and then plot the count

It sounds like you want to use cut to figure out how many values occur within an hour.

It's generally helpful if you can provide some sample data. Here's some:

set.seed(1) # So you can get the same numbers as I do
MyDates <- ISOdatetime(2012, 1, 1, 0, 0, 0, tz = "GMT") + sample(1:27000, 500)
head(MyDates)
# [1] "2012-01-01 01:59:29 GMT" "2012-01-01 02:47:27 GMT" "2012-01-01 04:17:46 GMT"
# [4] "2012-01-01 06:48:39 GMT" "2012-01-01 01:30:45 GMT" "2012-01-01 06:44:13 GMT"

You can use table and cut (with the argument breaks="hour" (see ?cut.Date for more info)) to find the frequencies per hour.

MyDatesTable <- table(cut(MyDates, breaks="hour"))
MyDatesTable
# 
# 2012-01-01 00:00:00 2012-01-01 01:00:00 2012-01-01 02:00:00 2012-01-01 03:00:00 
#                  59                  73                  74                  83 
# 2012-01-01 04:00:00 2012-01-01 05:00:00 2012-01-01 06:00:00 2012-01-01 07:00:00 
#                  52                  62                  64                  33 
# Or a data.frame if you prefer
data.frame(MyDatesTable)
#                  Var1 Freq
# 1 2012-01-01 00:00:00   59
# 2 2012-01-01 01:00:00   73
# 3 2012-01-01 02:00:00   74
# 4 2012-01-01 03:00:00   83
# 5 2012-01-01 04:00:00   52
# 6 2012-01-01 05:00:00   62
# 7 2012-01-01 06:00:00   64
# 8 2012-01-01 07:00:00   33

Finally, here's a line plot of the MyDatesTable object:

plot(MyDatesTable, type="l", xlab="Time", ylab="Freq")

Sample Image

cut can handle a range of time intervals. For example, if you wanted to tabulate for every 30 minutes, you can easily adapt the breaks argument to handle that:

data.frame(table(cut(MyDates, breaks = "30 mins")))
#                   Var1 Freq
# 1  2012-01-01 00:00:00   22
# 2  2012-01-01 00:30:00   37
# 3  2012-01-01 01:00:00   38
# 4  2012-01-01 01:30:00   35
# 5  2012-01-01 02:00:00   32
# 6  2012-01-01 02:30:00   42
# 7  2012-01-01 03:00:00   39
# 8  2012-01-01 03:30:00   44
# 9  2012-01-01 04:00:00   25
# 10 2012-01-01 04:30:00   27
# 11 2012-01-01 05:00:00   33
# 12 2012-01-01 05:30:00   29
# 13 2012-01-01 06:00:00   29
# 14 2012-01-01 06:30:00   35
# 15 2012-01-01 07:00:00   33

Update

Since you were trying to plot with ggplot2, here's one approach (not sure if it is the best since I usually use base R's graphics when I need to).

Create a data.frame of the table (as demonstrated above) and add a dummy "group" variable and plot that as follows:

MyDatesDF <- data.frame(MyDatesTable, grp = 1)
ggplot(MyDatesDF, aes(Var1, Freq)) + geom_line(aes(group = grp))

Sample Image

How to Count and Plot Interval Time Series Data (Hourly) in Python)?

You need to group you data by every 2h and set proper ticks for you plot.

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

df = pd.DataFrame({'date': ['18.05.2015 09:00:00','18.05.2015 15:00:00', '18.05.2015 14:14:00', '18.05.2015 11:00:00']})

# convert to datetime
df['date'] = pd.to_datetime(df['date'])
# group time every 2h
df = df.groupby(df.date.dt.floor('2H')).count()

# plot data
fig, ax1 = plt.subplots()
ax1.set_xticks(np.arange(8, 15, 2))
ax1.set_xticklabels(['8-10', '10-12', '12-14', '14-16'])
plt.plot(df.index.hour, df.date, '.')
plt.show()

OUtput:

Sample Image

Counts for time intervals

You can use the lubridate package to extract the hour, minutes, etc. from a date object. If you know the minute of the event time, you know in which half hour the event toke place. I sampled some data with two machines. I added a variable of the "whole" half hour after which the event toke place, then you can just count these. Hope this is what you are after, good luck.

require(lubridate)
set.seed(1234)
example.dat <- data.frame(
  machine = paste("M", sample(1:2, 100, replace = T), sep = ""),
  dates = sort(as.POSIXct(sample(1377100000:1377110000, 100), origin = "1960-01-01"))
  )
example.dat <- example.dat[order(example.dat$machine), ]

halfHours <- ifelse(minute(example.dat$dates) < 30, 0, 30)
example.dat$datesHH <- example.dat$dates
minute(example.dat$datesHH) <- halfHours; second(example.dat$datesHH) <- 0

data.frame(table(example.dat[ ,c(1,3)]))

R: Split observation values by and aggregate to time intervals

Here's a data.table approach which allows you to use SQL-type queries to sort/filter data and perform operations.

DATA

> p
    obs name               start                 end diff_corr
 1:  C2   C2 2017-05-22 04:25:00 2017-05-22 04:26:30 1.4673913
 2:  C2   C2 2017-05-22 04:26:30 2017-05-22 04:27:30 0.9782609
 3:  C2   C2 2017-05-22 04:27:30 2017-05-22 04:28:00 0.4891304
 4:  C2   C2 2017-05-22 04:28:00 2017-05-22 04:30:00 1.9565217
 5:  C2   C2 2017-05-22 06:03:00 2017-05-22 06:03:30 0.4891304
 6:  C2   C2 2017-05-22 06:03:30 2017-05-22 06:05:30 1.9565217
 7:  C2   C2 2017-05-22 06:05:30 2017-05-22 06:06:00 0.4891304
 8:  C2   C2 2017-05-22 06:06:00 2017-05-22 06:06:20 0.3260870
 9:  C2   C2 2017-05-22 06:06:20 2017-05-22 06:07:00 0.6521739
10:   b    b 2017-06-09 04:23:00 2017-06-09 04:26:00 2.9670330
11:   b  981 2017-06-09 04:23:00 2017-06-09 04:26:00 2.9670330
12:   b 1627 2017-06-09 04:23:00 2017-06-09 04:26:00 2.9670330
13:   b    b 2017-06-09 04:26:00 2017-06-09 04:27:00 0.9890110
14:   b    b 2017-06-09 04:27:00 2017-06-09 04:33:00 5.9340659
15:   b  981 2017-06-09 04:27:00 2017-06-09 04:33:00 5.9340659
16:   b 1627 2017-06-09 04:27:00 2017-06-09 04:33:00 5.9340659
17:   b    b 2017-06-09 04:33:00 2017-06-09 04:35:00 1.9780220
18:   b    b 2017-06-09 04:35:00 2017-06-09 04:37:00 1.9780220
19:   b    b 2017-06-09 04:37:00 2017-06-09 04:39:00 1.9780220
20:   b    b 2017-06-09 04:51:00 2017-06-09 04:52:00 0.9890110

CODE

library(data.table)
library(lubridate)
p <- as.data.table(p)
p[, .(new_diff = mean(diff_corr)), .(tme_start = round_date(start, unit = "15min"))]

OUTPUT

> p[, .(new_diff = mean(diff_corr)), .(tme_start = round_date(start, unit = "15min"))]
             tme_start  new_diff
1: 2017-05-22 04:30:00 1.2228261
2: 2017-05-22 06:00:00 0.7826087
3: 2017-06-09 04:30:00 3.3626374
4: 2017-06-09 04:45:00 0.9890110

What is Data.Table doing?

Since you aren't familiar with data.table, here's a very quick, elementary description of what is happening. General form of the data.table call is:

DT[select rows, perform operations, group by]

Where DT is the data.table name. Select rows is a logical operation e.g. say you want only observations for C2 (name), the call would be DT[name == "C2",] There is no operation required to be performed and no grouping. If you want the sum of diff_corr column for all name == "C2", the call becomes DT[name == "C2", list(sum(diff_corr))]. Instead of writing list() you can use .(). The output will now have a only one row and one column called V1 which is the sum of all diff_corr when name == "C2". The column doesn't have a lot of information so we assign it a name (can be the same as the old one): DT[name == "C2", .(diff_corr_sum = sum(diff_corr))]. Suppose you had another column called "mood" which reported the mood of the person making the observation and can assume three values ("happy", "sad", "sleepy"). You could "group by" the mood: DT[name == "C2", .(diff_corr_new = sum(diff_corr)), by = .(mood)]. The output would be three rows corresponding to each of the moods and one column diff_corr_new. To understand this better try playing around with a sample dataset like mtcars. Your sample data doesn't have enough complexity etc. to allow you to explore all of these functions.

Back to the answer - other variations

It's not clear from the question or comments if you want to round based on start or end. I used the former but you can change that. The example above uses mean but you can perform any other operations you may need. The other columns seem more or less redundant since they are strings and you can't do much with them. You could use them to further sort the results in the by entry (last field in the code). Below are two examples using obs and name respectively. You can also combine all of them together.

> p[, .(new_diff = mean(diff_corr)), .(tme_start = round_date(start, unit = "15min"), obs)]
             tme_start obs  new_diff
1: 2017-05-22 04:30:00  C2 1.2228261
2: 2017-05-22 06:00:00  C2 0.7826087
3: 2017-06-09 04:30:00   b 3.3626374
4: 2017-06-09 04:45:00   b 0.9890110


> p[, .(new_diff = mean(diff_corr)), .(tme_start = round_date(start, unit = "15min"), name)]
             tme_start name  new_diff
1: 2017-05-22 04:30:00   C2 1.2228261
2: 2017-05-22 06:00:00   C2 0.7826087
3: 2017-06-09 04:30:00    b 2.6373626
4: 2017-06-09 04:30:00  981 4.4505495
5: 2017-06-09 04:30:00 1627 4.4505495
6: 2017-06-09 04:45:00    b 0.9890110

How to plot multiple daily time series, aligned at specified trigger times?

Assuming the index has already been converted to_datetime, create an IntervalArray from -2H to +8H of the index:

dl, dr = -2, 8
left = df.index + pd.Timedelta(f'{dl}H')
right = df.index + pd.Timedelta(f'{dr}H')

df['interval'] = pd.arrays.IntervalArray.from_arrays(left, right)

Then for each ANNOUNCEMENT, plot the window from interval.left to interval.right:

Set the x-axis as seconds since ANNOUNCEMENT
Set the labels as hours since ANNOUNCEMENT

fig, ax = plt.subplots()
for ann in df.loc[df['msg_type'] == 'ANNOUNCEMENT'].itertuples():
    window = df.loc[ann.interval.left:ann.interval.right] # extract interval.left to interval.right
    window.index -= ann.Index                             # compute time since announcement
    window.index = window.index.total_seconds()           # convert to seconds since announcement

    window.plot(ax=ax, y='value', label=ann.Index.date())
    deltas = np.arange(dl, dr + 1)
    ax.set(xticks=deltas * 3600, xticklabels=deltas)      # set tick labels to hours since announcement

ax.legend()

Here is the output with a smaller window -1H to +2H just so we can see the small sample data more clearly (full code below):

Full code:

import io
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

s = '''
date,value,msg_type
2022-03-15 08:15:10+00:00,122,None
2022-03-15 08:25:10+00:00,125,None
2022-03-15 08:30:10+00:00,126,None
2022-03-15 08:30:26.542134+00:00,127,ANNOUNCEMENT
2022-03-15 08:35:10+00:00,128,None
2022-03-15 08:40:10+00:00,122,None
2022-03-15 08:45:09+00:00,127,None
2022-03-15 08:50:09+00:00,133,None
2022-03-15 08:55:09+00:00,134,None
2022-03-16 09:30:09+00:00,132,None
2022-03-16 09:30:13.234425+00:00,135,ANNOUNCEMENT
2022-03-16 09:35:09+00:00,130,None
2022-03-16 09:40:09+00:00,134,None
2022-03-16 09:45:09+00:00,135,None
2022-03-16 09:50:09+00:00,134,None
'''
df = pd.read_csv(io.StringIO(s), index_col=0, parse_dates=['date'])

# create intervals from -1H to +2H of the index
dl, dr = -1, 2
left = df.index + pd.Timedelta(f'{dl}H')
right = df.index + pd.Timedelta(f'{dr}H')
df['interval'] = pd.arrays.IntervalArray.from_arrays(left, right)

# plot each announcement's interval.left to interval.right
fig, ax = plt.subplots()
for ann in df.loc[df['msg_type'] == 'ANNOUNCEMENT')].itertuples():
    window = df.loc[ann.interval.left:ann.interval.right] # extract interval.left to interval.right
    window.index -= ann.Index                             # compute time since announcement
    window.index = window.index.total_seconds()           # convert to seconds since announcement

    window.plot(ax=ax, y='value', label=ann.Index.date())
    deltas = np.arange(dl, dr + 1)
    ax.set(xticks=deltas * 3600, xticklabels=deltas)      # set tick labels to hours since announcement

ax.grid()
ax.legend()

R Time Intervals: Grouping by hour of day when sample goes over the hour mark

A more beautiful solution than Dario's can be made using tidyverse:

Reading data

a =  
        read.csv(header = F, sep = ";",
                 col.names = c("Event","DT.event","off.bout.ID","on.bout.ID","off.time.diff","on.time.diff"),
                 text = gsub(pattern = "\\s+{2}",replacement = ";", 
                             x="off     4/27/12 17:25:13    1          0           NA               NA
                        on      4/27/12 17:25:39    1          1           26               NA
                        off     4/27/12 18:03:29    2          1           NA               2270
                        on      4/27/12 18:03:57    2          2           28               NA
                        off     4/27/12 19:41:16    3          2           NA               5839
                        on      4/27/12 19:43:50    3          3           154              NA
                        off     4/28/12 6:23:57     4          3           NA               38407
                        on      4/28/12 6:32:13     4          4           496              NA
                        off     4/28/12 6:40:20     5          4           NA               487
                        on      4/28/12 6:40:48     5          5           28               NA
                        off     4/28/12 8:16:07     6          5           NA               5719"
                 )
        ) 

a$DT.event <- mdy_hms(a$DT.event)

Adding a new row containing the hours that can be of interest

b <- a %>% select(DT.event) %>%
        mutate(DT.event = floor_date(DT.event,"hours")) %>%
        group_by(DT.event) %>%
        summarise() %>%
        full_join(a) %>%
        arrange(DT.event)

Finding differences

c <- b %>% fill(Event, .direction = "up") %>%
        mutate(on.time.diff.hour = ifelse(Event == "off",
                                          difftime(DT.event, lag(DT.event),
                                                   "secs"), NA))

You just need to pay attention to check weather you got an extra value in the second line (as there was no on before it).

The results

# A tibble: 16 x 7
   DT.event            Event off.bout.ID on.bout.ID off.time.diff on.time.diff on.time.diff.hour
   <dttm>              <fct>       <int>      <int>         <int>        <int>             <dbl>
 1 2012-04-27 17:00:00 off            NA         NA            NA           NA                NA
 2 2012-04-27 17:25:13 off             1          0            NA           NA              1513
 3 2012-04-27 17:25:39 on              1          1            26           NA                NA
 4 2012-04-27 18:00:00 off            NA         NA            NA           NA              2061
 5 2012-04-27 18:03:29 off             2          1            NA         2270               209
 6 2012-04-27 18:03:57 on              2          2            28           NA                NA
 7 2012-04-27 19:00:00 off            NA         NA            NA           NA              3363
 8 2012-04-27 19:41:16 off             3          2            NA         5839              2476
 9 2012-04-27 19:43:50 on              3          3           154           NA                NA
10 2012-04-28 06:00:00 off            NA         NA            NA           NA             36970
11 2012-04-28 06:23:57 off             4          3            NA        38407              1437
12 2012-04-28 06:32:13 on              4          4           496           NA                NA
13 2012-04-28 06:40:20 off             5          4            NA          487               487
14 2012-04-28 06:40:48 on              5          5            28           NA                NA
15 2012-04-28 08:00:00 off            NA         NA            NA           NA              4752
16 2012-04-28 08:16:07 off             6          5            NA         5719               967

Split Time Series Data into Time Intervals (Say an Hour) and Then Plot the Count