Annual, Monthly or Daily Mean for Irregular Time Series

Annual, monthly or daily mean for irregular time series

Convert your data to an xts object, then use apply.daily et al to calculate whatever values you want.

library(xts)
d <- structure(list(dates = c("12/03/2012 11:26", "12/03/2012 11:56",
"12/03/2012 12:26"), temperature = c(9.7533, 9.6673, 9.6673),
depth = c(0.48073, 0.33281, 0.33281), salinity = c(37.607,
37.662, 37.672)), .Names = c("dates", "temperature", "depth",
"salinity"), row.names = c(NA, -3L), class = "data.frame")
x <- xts(d[,-1], as.POSIXct(d[,1], format="%m/%d/%Y %H:%M"))
apply.daily(x, colMeans)
# temperature depth salinity
# 2012-12-03 12:26:00 9.695967 0.3821167 37.647

Splitting irregular time series into regular monthly averages - R

Here's a start using data.table :

billdata <- read.table(text=" acct amount begin end days
1 2242 11349 2009-10-06 2009-11-04 29
2 2242 12252 2009-11-04 2009-12-04 30
3 2242 21774 2009-12-04 2010-01-08 35
4 2242 18293 2010-01-08 2010-02-05 28
5 2243 27217 2009-10-06 2009-11-04 29
6 2243 117 2009-11-04 2009-12-04 30
7 2243 14543 2009-12-04 2010-01-08 35", sep=" ", header=TRUE, row.names=1)

require(data.table)
DT = as.data.table(billdata)

First, change type of columns begin and end to dates. Unlike data.frame, this doesn't copy the entire dataset.

DT[,begin:=as.Date(begin)]
DT[,end:=as.Date(end)]

Then find the time span, find the prevailing bill for each day, and aggregate.

alldays = DT[,seq(min(begin),max(end),by="day")]

setkey(DT, acct, begin)

DT[CJ(unique(acct),alldays),
mean(amount/days,na.rm=TRUE),
by=list(acct,month=format(begin,"%Y-%m")), roll=TRUE]

acct month V1
1: 2242 2009-10 391.34483
2: 2242 2009-11 406.69448
3: 2242 2009-12 601.43226
4: 2242 2010-01 646.27465
5: 2242 2010-02 653.32143
6: 2243 2009-10 938.51724
7: 2243 2009-11 97.36172
8: 2243 2009-12 375.68065
9: 2243 2010-01 415.51429
10: 2243 2010-02 415.51429

I think you'll find the prevailing join logic quite cumbersome in SQL, and slower.

I say it's a hint because it's not quite correct. Notice row 10 is repeated because account 2243 doesn't stretch into 2010-02 unlike account 2242. To finish it off you could rbind in the last row for each account and use rolltolast instead of roll. Or perhaps create alldays by account rather than across all accounts.

See if speed is acceptable on the above, and we can go from there.

It's likely you will hit a bug in 1.8.2 that has been fixed in 1.8.3. I'm using v1.8.3.

"Internal" error message when combining join containing missing groups and group by
is fixed, #2162. For example :
X[Y,.N,by=NonJoinColumn]
where Y contains some rows that don't match to X. This bug could also result in a seg
fault.

Let me know and we can either work around, or upgrade to 1.8.3 from R-Forge.

Btw, nice example data. That made it quicker to answer.


Here's the full answer alluded to above. It's a bit tricky I have to admit, as it combines together several features of data.table. This should work in 1.8.2 as it happens, but I've only tested in 1.8.3.

DT[ setkey(DT[,seq(begin[1],last(end),by="day"),by=acct]),
mean(amount/days,na.rm=TRUE),
by=list(acct,month=format(begin,"%Y-%m")), roll=TRUE]

acct month V1
1: 2242 2009-10 391.34483
2: 2242 2009-11 406.69448
3: 2242 2009-12 601.43226
4: 2242 2010-01 646.27465
5: 2242 2010-02 653.32143
6: 2243 2009-10 938.51724
7: 2243 2009-11 97.36172
8: 2243 2009-12 375.68065
9: 2243 2010-01 415.51429

R : Converting Daily time series to monthly

The possible solution for OP issue can be based on:

#Code
y.mon<-aggregate(property_damages_in_dollars~format(as.Date(date_decision_made),
format="%Y/%m"),data=final_data, FUN=sum)

More variants can be explored around the format() options.

R: Daily data to monthly

alternatively you could "tell" R that you are using dates of a certain format with the as.Date() function and then use format() to change it to the format you desire.
Like this:

dates=c("2011-06-28","2011-06-29","2011-06-30","2011-07-1") #test string with dates in original format

dates2 <- format(as.Date(dates,"%Y-%m-%d"), format="%Y-%m") #changing the "%Y-%m-%d" format to the desired "%Y-%m"

print(dates2)

Edit: If you only want to change the index of a xts:
indexFormat(xts_object) <- "%Y-%m"

Cheers
Chris

How do I calculate a monthly rate of change from a daily time series in R?

Here's one way to do it using plyr and ddply.
I use ddply sequentially, first to get the first and last rows of each month, and again to calculate the monthlyReturn.
(Perhaps using xts or zoo might be easier, I am not sure.)

#Using plyr and the data in df
df$Date <- as.POSIXlt(as.Date(df$Date, "%m/%d/%Y"))
df$Month <- (df$Date$mon + 1) #0 = January

sdf <- df[,-1] #drop the Date Column, ddply doesn't like it

library("plyr")
#this function is called with 2 row data frames
monthlyReturn<- function(df) {
(df$Value[2] - df$Value[1])/(df$Value[1])
}

adf <- ddply(sdf, .(Month), function(x) x[c(1, nrow(x)), ]) #get first and last values for each Month
mon.returns <- ddply(adf, .(Month), monthlyReturn)

Here's the data I used to test it out:

> df
Date Value
1 1/1/2000 10.0
2 1/31/2000 10.1
3 2/1/2000 10.2
4 2/28/2000 11.0
5 3/1/2000 10.0
6 3/31/2000 24.1
7 5/10/2000 510.0
8 5/22/2000 522.0
9 6/04/2000 604.0
10 7/03/2000 10.1
11 7/30/2000 7.2
12 12/28/2000 11.0
13 12/30/2000 3.0

> mon.returns
Month V1
1 1 0.01000000
2 2 0.07843137
3 3 1.41000000
4 5 0.02352941
5 6 0.00000000
6 7 -0.28712871
7 12 -0.72727273

Hope that helps.

Calculating daily average from irregular time series using pandas

You want dayfirst=True, one of the many tweaks listed in the read_csv docs.

Selecting regular intervals from time series

Like this?

sec <- as.integer(format(index(CO2xts),"%S"))
CO2xts[sec > 24 & sec < 57]
# [,1]
# 2014-12-01 00:00:27 433.6
# 2014-12-01 00:00:30 434.3
# 2014-12-01 00:00:33 434.4
# 2014-12-01 00:00:36 435.4
# 2014-12-01 00:00:39 434.3
# 2014-12-01 00:00:42 434.0
# 2014-12-01 00:00:45 434.1
# 2014-12-01 00:00:48 434.4
# 2014-12-01 00:00:51 434.5
# 2014-12-01 00:00:54 434.9

Aggregating time series to yearly data

"yearmon" and "yearqtr" classes represent dates as year + fraction so:

as.year <- function(x) as.integer(as.yearmon(x))

Also note this construct: diff(x, arithmetic = FALSE) - 1



Related Topics



Leave a reply



Submit