aggregating time series in R
I'll explain your error and tell you how to fix it, but there's a better way to do what you're doing. So make sure you read my entire answer!
From the error message, the length of your by
is not the same length as Vo(tickmin)
.
You have to generate your by
to have one value per corresponding value in tickmin
, with the day.
As an example here I generate an xts
object:
# generate a set of times from 2010-06-30 onwards at 20 minute intervals
tms <- as.POSIXct(seq(0,3600*24*30,by=60*20),origin="2010-06-30")
n <- length(tms)
# generate volumes for those intervals, random 0 -- 100, turn into xts object
xts.ts <- xts(sample.int(100,n,replace=T),tms)
colnames(xts.ts)<-'Volume'
which yields:
> head(xts.ts)
Volume
2010-06-30 00:00:00 97
2010-06-30 00:20:00 78
2010-06-30 00:40:00 38
2010-06-30 01:00:00 86
2010-06-30 01:20:00 79
2010-06-30 01:40:00 55
To access the dates of xts.ts
you use index(xts.ts)
which gives a whole bunch of strings of the date, e.g. "2010-07-30 00:00:00 EST"
.
To round these to the nearest day you can use as.Date
:
> as.Date(index(xts.ts))
[1] "2010-06-29" "2010-06-29" "2010-06-29" "2010-06-29" "2010-06-29"
....
Solution to your problem
Then to use aggregate
you do:
> aggregate(Vo(xts.ts),as.Date(index(xts.ts)),sum)
2010-06-29 1858
2010-06-30 3733
2010-07-01 3906
2010-07-02 3359
2010-07-03 3838
...
Better solution to your problem
The xts
package has functions apply.daily
, apply.monthly
, etc (use ls('package:xts')
to see what functions it has -- there may be ones you're interested in).
apply.daily(x,FUN,...)
does exactly what you want. See ?apply.daily
.
To use it you can do:
> apply.daily(xts.ts,sum)
Volume
2010-06-30 23:40:00 4005
2010-07-01 23:40:00 4093
2010-07-02 23:40:00 3419
2010-07-03 23:40:00 3737
...
Or if your xts
object has other columns like Open
, Close
etc, you can do apply.daily(xts.ts, function(x) sum(Vo(x)))
.
Note that the answers are slightly different using apply.daily
to the aggregate ... as.Date
method. That's because apply.daily
goes daily from start(xts.ts)
to end(xts.ts)
(more or less) whereas aggregate
just went by day from midnight to midnight.
Looking at your question, apply.daily
seems to match most closely what you want to do (and is provided with xts
anyway, so why not use it?)
aggregating 15minute time series data to daily
Use the apply.daily
from the xts
package after converting your data to an xts
object:
Something like this should work:
x2 = read.table(header=TRUE, text=' "Index" "temp" "m"
1 "2012-02-07 18:15:13" "4297"
2 "2012-02-07 18:30:04" "4296"
3 "2012-02-07 18:45:10" "4297"
4 "2012-02-07 19:00:01" "4297"
5 "2012-02-07 19:15:07" "4298"
6 "2012-02-07 19:30:13" "4299"
7 "2012-02-07 19:45:04" "4299"
8 "2012-02-07 20:00:10" "4299"
9 "2012-02-07 20:15:01" "4300"
10 "2012-02-07 20:30:07" "4301"')
x2$temp = as.POSIXct(strptime(x2$temp, "%Y-%m-%d %H:%M:%S"))
require(xts)
x2 = xts(x = x2$m, order.by = x2$temp)
apply.daily(x2, mean)
## [,1]
## 2012-02-07 20:30:07 4298.3
Update: Your problem in a reproducable format (with fake data)
We don't always need the actual dataset to be able to help troubleshoot....
set.seed(1) # So you can get the same numbers as I do
x = data.frame(datetime = seq(ISOdatetime(1970, 1, 1, 0, 0, 0),
length = 384, by = 900),
m = sample(2000:4000, 384, replace = TRUE))
head(x)
# datetime m
# 1 1970-01-01 00:00:00 2531
# 2 1970-01-01 00:15:00 2744
# 3 1970-01-01 00:30:00 3146
# 4 1970-01-01 00:45:00 3817
# 5 1970-01-01 01:00:00 2403
# 6 1970-01-01 01:15:00 3797
require(xts)
x2 = xts(x$m, x$datetime)
head(x2)
# [,1]
# 1970-01-01 00:00:00 2531
# 1970-01-01 00:15:00 2744
# 1970-01-01 00:30:00 3146
# 1970-01-01 00:45:00 3817
# 1970-01-01 01:00:00 2403
# 1970-01-01 01:15:00 3797
apply.daily(x2, mean)
# [,1]
# 1970-01-01 23:45:00 3031.302
# 1970-01-02 23:45:00 3043.250
# 1970-01-03 23:45:00 2896.771
# 1970-01-04 23:45:00 2996.479
Update 2: A workaround alternative
(Using the fake data I've provided in the above update.)
data.frame(time = x[seq(96, nrow(x), by=96), 1],
mean = aggregate(ts(x[, 2], freq = 96), 1, mean))
# time mean
# 1 1970-01-01 23:45 3031.302
# 2 1970-01-02 23:45 3043.250
# 3 1970-01-03 23:45 2896.771
# 4 1970-01-04 23:45 2996.479
Aggregate time series object by month R
We could get the mean
for each 'month' using tapply
and then plot
meanVal <- tapply(anom_tsUNAD, cycle(anom_tsUNAD), FUN=mean)
plot(meanVal)
The cycle
gives the numeric position in the cycle for each observation. For 'Jan' it is 1 and 'Dec' it is 12. We use that as a grouping variable in the tapply
to calculate the mean
.
data
anom_tsUNAD <- ts(1:40, start=c(1922,1), frequency=12)
Aggregate (Summarize) multiple Time Series Data by Month in .r
I think you're already in the right direction. My suggested workaround to this would be to define the function prior to running the purrr::map function.
Therefore, the code should look something like this:
# Load packages
library(tidyverse)
library(dplyr)
library(purrr)
# Setting working directory
workingdirectory <- "D:/Directory"
setwd(workingdirectory)
# Listing the files in the folder with .txt extension
FilesList <- list.files(workingdirectory, pattern = "\\.txt$", full.names = TRUE)
columnNames <- c("year", "month", "day", "pcp_day")
# define function
processing <- function(x){
x %>% read.csv(sep = "", header = FALSE, stringsAsFactors = FALSE) %>% rename_at(c(1,2,3,7), ~columnNames) %>% filter(month != 2 | day != 29) %>% group_by(month, year) %>% summarise(monthly = sum(pcp_day))
}
# Looping per files and # Write the data back
purrr::map(FilesList, ~processing(.x) %>% write.csv(paste0('Result_', basename(.x)), row.names = FALSE))
If run successfully, you can find the outputs in the working directory you work in.
Aggregate timeseries to length/N points
cut.POSIXt
can be used like this allowing an arbitrary number of seconds.
secs <- 7200
as.POSIXt(cut(data$datetime, paste(secs, "secs")) + secs
Checking we have:
identical(cut(data$datetime, "7200 secs"), cut(data$datetime, "2 hours"))
## [1] TRUE
As you have undoubtedly noticed, unfortunately this does not work with ceiling_date:
identical(ceiling_date(data$datetime, "2 hours"),
ceiling_date(data$datetime, "7200 secs"))
## [1] FALSE
Examplesecs <- 3750
agg_period <- paste(secs, "secs")
agg_data <- data %>%
group_by(across(-c(Value, datetime)),
datetime = as.POSIXct(cut(datetime, agg_period)) + secs) %>%
summarise (Value = median(Value) , .groups = "drop")
dim(agg_data)
## [1] 402 3
Related Topics
Unnesting a List of Lists in a Data Frame Column
Stl Decomposition of Time Series with Missing Values for Anomaly Detection
How to Ignore Case When Using Str_Detect
Geom_Bar() + Pictograms, How To
How to Know If R Is Running on 64 Bits Versus 32
R "Stats" Citation for a Scientific Paper
Specifying Column Types When Importing Xlsx Data to R with Package Readxl
Odds Ratios Instead of Logits in Stargazer() Latex Output
Writing Functions VS. Line-By-Line Interpretation in an R Workflow
How to Select Rows from Data.Frame with 2 Conditions
Duplicate a Column in Data Frame and Rename It to Another Column Name
Grid Line Consistent with Ticks on Axis
How to Control the Igraph Plot Layout with Fixed Positions
How to Manually Set Colors in a Bar Chart
How to Properly Use Functions from Other Packages in a R Package
R: How Does a Foreach Loop Find a Function That Should Be Invoked
secs <- 3750
agg_period <- paste(secs, "secs")
agg_data <- data %>%
group_by(across(-c(Value, datetime)),
datetime = as.POSIXct(cut(datetime, agg_period)) + secs) %>%
summarise (Value = median(Value) , .groups = "drop")
dim(agg_data)
## [1] 402 3
Unnesting a List of Lists in a Data Frame Column
Stl Decomposition of Time Series with Missing Values for Anomaly Detection
How to Ignore Case When Using Str_Detect
Geom_Bar() + Pictograms, How To
How to Know If R Is Running on 64 Bits Versus 32
R "Stats" Citation for a Scientific Paper
Specifying Column Types When Importing Xlsx Data to R with Package Readxl
Odds Ratios Instead of Logits in Stargazer() Latex Output
Writing Functions VS. Line-By-Line Interpretation in an R Workflow
How to Select Rows from Data.Frame with 2 Conditions
Duplicate a Column in Data Frame and Rename It to Another Column Name
Grid Line Consistent with Ticks on Axis
How to Control the Igraph Plot Layout with Fixed Positions
How to Manually Set Colors in a Bar Chart
How to Properly Use Functions from Other Packages in a R Package
R: How Does a Foreach Loop Find a Function That Should Be Invoked