Aggregating Time Series in R

aggregating time series in R

I'll explain your error and tell you how to fix it, but there's a better way to do what you're doing. So make sure you read my entire answer!

From the error message, the length of your by is not the same length as Vo(tickmin).
You have to generate your by to have one value per corresponding value in tickmin, with the day.

As an example here I generate an xts object:

# generate a set of times from 2010-06-30 onwards at 20 minute intervals
tms <- as.POSIXct(seq(0,3600*24*30,by=60*20),origin="2010-06-30")
n   <- length(tms)
# generate volumes for those intervals, random 0 -- 100, turn into xts object
xts.ts <- xts(sample.int(100,n,replace=T),tms)
colnames(xts.ts)<-'Volume'

which yields:

> head(xts.ts)
                    Volume
2010-06-30 00:00:00     97
2010-06-30 00:20:00     78
2010-06-30 00:40:00     38
2010-06-30 01:00:00     86
2010-06-30 01:20:00     79
2010-06-30 01:40:00     55

To access the dates of xts.ts you use index(xts.ts) which gives a whole bunch of strings of the date, e.g. "2010-07-30 00:00:00 EST".

To round these to the nearest day you can use as.Date:

> as.Date(index(xts.ts))
   [1] "2010-06-29" "2010-06-29" "2010-06-29" "2010-06-29" "2010-06-29"
    ....

Solution to your problem

Then to use aggregate you do:

> aggregate(Vo(xts.ts),as.Date(index(xts.ts)),sum)

2010-06-29 1858
2010-06-30 3733
2010-07-01 3906
2010-07-02 3359
2010-07-03 3838
...

Better solution to your problem

The xts package has functions apply.daily, apply.monthly, etc (use ls('package:xts') to see what functions it has -- there may be ones you're interested in).

apply.daily(x,FUN,...) does exactly what you want. See ?apply.daily.
To use it you can do:

> apply.daily(xts.ts,sum)

                    Volume
2010-06-30 23:40:00   4005
2010-07-01 23:40:00   4093
2010-07-02 23:40:00   3419
2010-07-03 23:40:00   3737
...

Or if your xts object has other columns like Open, Close etc, you can do apply.daily(xts.ts, function(x) sum(Vo(x))).

Note that the answers are slightly different using apply.daily to the aggregate ... as.Date method. That's because apply.daily goes daily from start(xts.ts) to end(xts.ts) (more or less) whereas aggregate just went by day from midnight to midnight.

Looking at your question, apply.daily seems to match most closely what you want to do (and is provided with xts anyway, so why not use it?)

aggregating 15minute time series data to daily

Use the apply.daily from the xts package after converting your data to an xts object:

Something like this should work:

x2 = read.table(header=TRUE, text='     "Index" "temp" "m"
1 "2012-02-07 18:15:13" "4297"
2 "2012-02-07 18:30:04" "4296"
3 "2012-02-07 18:45:10" "4297"
4 "2012-02-07 19:00:01" "4297"
5 "2012-02-07 19:15:07" "4298"
6 "2012-02-07 19:30:13" "4299"
7 "2012-02-07 19:45:04" "4299"
8 "2012-02-07 20:00:10" "4299"
9 "2012-02-07 20:15:01" "4300"
10 "2012-02-07 20:30:07" "4301"')

x2$temp = as.POSIXct(strptime(x2$temp, "%Y-%m-%d %H:%M:%S"))
require(xts)
x2 = xts(x = x2$m, order.by = x2$temp)
apply.daily(x2, mean)
##                       [,1]
## 2012-02-07 20:30:07 4298.3

Update: Your problem in a reproducable format (with fake data)

We don't always need the actual dataset to be able to help troubleshoot....

set.seed(1) # So you can get the same numbers as I do
x = data.frame(datetime = seq(ISOdatetime(1970, 1, 1, 0, 0, 0), 
                              length = 384, by = 900), 
               m = sample(2000:4000, 384, replace = TRUE))
head(x)
#              datetime    m
# 1 1970-01-01 00:00:00 2531
# 2 1970-01-01 00:15:00 2744
# 3 1970-01-01 00:30:00 3146
# 4 1970-01-01 00:45:00 3817
# 5 1970-01-01 01:00:00 2403
# 6 1970-01-01 01:15:00 3797
require(xts)
x2 = xts(x$m, x$datetime)
head(x2)
#                     [,1]
# 1970-01-01 00:00:00 2531
# 1970-01-01 00:15:00 2744
# 1970-01-01 00:30:00 3146
# 1970-01-01 00:45:00 3817
# 1970-01-01 01:00:00 2403
# 1970-01-01 01:15:00 3797
apply.daily(x2, mean)
#                         [,1]
# 1970-01-01 23:45:00 3031.302
# 1970-01-02 23:45:00 3043.250
# 1970-01-03 23:45:00 2896.771
# 1970-01-04 23:45:00 2996.479

Update 2: A workaround alternative

(Using the fake data I've provided in the above update.)

data.frame(time = x[seq(96, nrow(x), by=96), 1],
           mean = aggregate(ts(x[, 2], freq = 96), 1, mean))
#               time     mean
# 1 1970-01-01 23:45 3031.302
# 2 1970-01-02 23:45 3043.250
# 3 1970-01-03 23:45 2896.771
# 4 1970-01-04 23:45 2996.479

Aggregate time series object by month R

We could get the mean for each 'month' using tapply and then plot

 meanVal <- tapply(anom_tsUNAD, cycle(anom_tsUNAD), FUN=mean)
 plot(meanVal)

The cycle gives the numeric position in the cycle for each observation. For 'Jan' it is 1 and 'Dec' it is 12. We use that as a grouping variable in the tapply to calculate the mean.

data

 anom_tsUNAD <- ts(1:40, start=c(1922,1), frequency=12)

Aggregate (Summarize) multiple Time Series Data by Month in .r

I think you're already in the right direction. My suggested workaround to this would be to define the function prior to running the purrr::map function.

Therefore, the code should look something like this:

# Load packages
library(tidyverse)
library(dplyr)
library(purrr)

# Setting working directory
workingdirectory <- "D:/Directory"
setwd(workingdirectory)

# Listing the files in the folder with .txt extension
FilesList <- list.files(workingdirectory, pattern = "\\.txt$", full.names = TRUE)
columnNames <- c("year", "month", "day", "pcp_day")
# define function
processing <- function(x){
  x %>% read.csv(sep = "", header = FALSE, stringsAsFactors = FALSE) %>% rename_at(c(1,2,3,7), ~columnNames) %>% filter(month != 2 | day != 29) %>% group_by(month, year) %>% summarise(monthly = sum(pcp_day))
}
# Looping per files and # Write the data back
purrr::map(FilesList, ~processing(.x) %>% write.csv(paste0('Result_', basename(.x)), row.names = FALSE))

If run successfully, you can find the outputs in the working directory you work in.

Aggregate timeseries to length/N points

cut.POSIXt can be used like this allowing an arbitrary number of seconds.

secs <- 7200
as.POSIXt(cut(data$datetime, paste(secs, "secs")) + secs

Checking we have:

identical(cut(data$datetime, "7200 secs"), cut(data$datetime, "2 hours"))
## [1] TRUE

As you have undoubtedly noticed, unfortunately this does not work with ceiling_date:

identical(ceiling_date(data$datetime, "2 hours"), 
  ceiling_date(data$datetime, "7200 secs"))
## [1] FALSE

Example

secs <- 3750
agg_period <- paste(secs, "secs")

agg_data <- data %>%  
    group_by(across(-c(Value, datetime)),  
      datetime = as.POSIXct(cut(datetime, agg_period)) + secs) %>%
    summarise (Value = median(Value) , .groups = "drop")

dim(agg_data)
## [1] 402   3

Aggregating Time Series in R