Annual, monthly or daily mean for irregular time series
Convert your data to an xts object, then use apply.daily
et al to calculate whatever values you want.
library(xts)
d <- structure(list(dates = c("12/03/2012 11:26", "12/03/2012 11:56",
"12/03/2012 12:26"), temperature = c(9.7533, 9.6673, 9.6673),
depth = c(0.48073, 0.33281, 0.33281), salinity = c(37.607,
37.662, 37.672)), .Names = c("dates", "temperature", "depth",
"salinity"), row.names = c(NA, -3L), class = "data.frame")
x <- xts(d[,-1], as.POSIXct(d[,1], format="%m/%d/%Y %H:%M"))
apply.daily(x, colMeans)
# temperature depth salinity
# 2012-12-03 12:26:00 9.695967 0.3821167 37.647
Splitting irregular time series into regular monthly averages - R
Here's a start using data.table
:
billdata <- read.table(text=" acct amount begin end days
1 2242 11349 2009-10-06 2009-11-04 29
2 2242 12252 2009-11-04 2009-12-04 30
3 2242 21774 2009-12-04 2010-01-08 35
4 2242 18293 2010-01-08 2010-02-05 28
5 2243 27217 2009-10-06 2009-11-04 29
6 2243 117 2009-11-04 2009-12-04 30
7 2243 14543 2009-12-04 2010-01-08 35", sep=" ", header=TRUE, row.names=1)
require(data.table)
DT = as.data.table(billdata)
First, change type of columns begin
and end
to dates. Unlike data.frame, this doesn't copy the entire dataset.
DT[,begin:=as.Date(begin)]
DT[,end:=as.Date(end)]
Then find the time span, find the prevailing bill for each day, and aggregate.
alldays = DT[,seq(min(begin),max(end),by="day")]
setkey(DT, acct, begin)
DT[CJ(unique(acct),alldays),
mean(amount/days,na.rm=TRUE),
by=list(acct,month=format(begin,"%Y-%m")), roll=TRUE]
acct month V1
1: 2242 2009-10 391.34483
2: 2242 2009-11 406.69448
3: 2242 2009-12 601.43226
4: 2242 2010-01 646.27465
5: 2242 2010-02 653.32143
6: 2243 2009-10 938.51724
7: 2243 2009-11 97.36172
8: 2243 2009-12 375.68065
9: 2243 2010-01 415.51429
10: 2243 2010-02 415.51429
I think you'll find the prevailing join logic quite cumbersome in SQL, and slower.
I say it's a hint because it's not quite correct. Notice row 10 is repeated because account 2243 doesn't stretch into 2010-02 unlike account 2242. To finish it off you could rbind
in the last row for each account and use rolltolast
instead of roll
. Or perhaps create alldays
by account rather than across all accounts.
See if speed is acceptable on the above, and we can go from there.
It's likely you will hit a bug in 1.8.2 that has been fixed in 1.8.3. I'm using v1.8.3.
"Internal" error message when combining join containing missing groups and group by
is fixed, #2162. For example :
X[Y,.N,by=NonJoinColumn]
where Y contains some rows that don't match to X. This bug could also result in a seg
fault.
Let me know and we can either work around, or upgrade to 1.8.3 from R-Forge.
Btw, nice example data. That made it quicker to answer.
Here's the full answer alluded to above. It's a bit tricky I have to admit, as it combines together several features of data.table
. This should work in 1.8.2 as it happens, but I've only tested in 1.8.3.
DT[ setkey(DT[,seq(begin[1],last(end),by="day"),by=acct]),
mean(amount/days,na.rm=TRUE),
by=list(acct,month=format(begin,"%Y-%m")), roll=TRUE]
acct month V1
1: 2242 2009-10 391.34483
2: 2242 2009-11 406.69448
3: 2242 2009-12 601.43226
4: 2242 2010-01 646.27465
5: 2242 2010-02 653.32143
6: 2243 2009-10 938.51724
7: 2243 2009-11 97.36172
8: 2243 2009-12 375.68065
9: 2243 2010-01 415.51429
R : Converting Daily time series to monthly
The possible solution for OP issue can be based on:
#Code
y.mon<-aggregate(property_damages_in_dollars~format(as.Date(date_decision_made),
format="%Y/%m"),data=final_data, FUN=sum)
More variants can be explored around the format()
options.
R: Daily data to monthly
alternatively you could "tell" R that you are using dates of a certain format with the as.Date()
function and then use format()
to change it to the format you desire.
Like this:
dates=c("2011-06-28","2011-06-29","2011-06-30","2011-07-1") #test string with dates in original format
dates2 <- format(as.Date(dates,"%Y-%m-%d"), format="%Y-%m") #changing the "%Y-%m-%d" format to the desired "%Y-%m"
print(dates2)
Edit: If you only want to change the index of a xts:indexFormat(xts_object) <- "%Y-%m"
Cheers
Chris
How do I calculate a monthly rate of change from a daily time series in R?
Here's one way to do it using plyr
and ddply
.
I use ddply sequentially, first to get the first and last rows of each month, and again to calculate the monthlyReturn.
(Perhaps using xts or zoo might be easier, I am not sure.)
#Using plyr and the data in df
df$Date <- as.POSIXlt(as.Date(df$Date, "%m/%d/%Y"))
df$Month <- (df$Date$mon + 1) #0 = January
sdf <- df[,-1] #drop the Date Column, ddply doesn't like it
library("plyr")
#this function is called with 2 row data frames
monthlyReturn<- function(df) {
(df$Value[2] - df$Value[1])/(df$Value[1])
}
adf <- ddply(sdf, .(Month), function(x) x[c(1, nrow(x)), ]) #get first and last values for each Month
mon.returns <- ddply(adf, .(Month), monthlyReturn)
Here's the data I used to test it out:
> df
Date Value
1 1/1/2000 10.0
2 1/31/2000 10.1
3 2/1/2000 10.2
4 2/28/2000 11.0
5 3/1/2000 10.0
6 3/31/2000 24.1
7 5/10/2000 510.0
8 5/22/2000 522.0
9 6/04/2000 604.0
10 7/03/2000 10.1
11 7/30/2000 7.2
12 12/28/2000 11.0
13 12/30/2000 3.0
> mon.returns
Month V1
1 1 0.01000000
2 2 0.07843137
3 3 1.41000000
4 5 0.02352941
5 6 0.00000000
6 7 -0.28712871
7 12 -0.72727273
Hope that helps.
Calculating daily average from irregular time series using pandas
You want dayfirst=True
, one of the many tweaks listed in the read_csv docs.
Selecting regular intervals from time series
Like this?
sec <- as.integer(format(index(CO2xts),"%S"))
CO2xts[sec > 24 & sec < 57]
# [,1]
# 2014-12-01 00:00:27 433.6
# 2014-12-01 00:00:30 434.3
# 2014-12-01 00:00:33 434.4
# 2014-12-01 00:00:36 435.4
# 2014-12-01 00:00:39 434.3
# 2014-12-01 00:00:42 434.0
# 2014-12-01 00:00:45 434.1
# 2014-12-01 00:00:48 434.4
# 2014-12-01 00:00:51 434.5
# 2014-12-01 00:00:54 434.9
Aggregating time series to yearly data
"yearmon"
and "yearqtr"
classes represent dates as year + fraction so:
as.year <- function(x) as.integer(as.yearmon(x))
Also note this construct: diff(x, arithmetic = FALSE) - 1
Related Topics
Faster Way to Compare Rows in a Data Frame
Format Latitude and Longitude Axis Labels in Ggplot
Replace Numbers in Data Frame Column in R
How to Turn the Numeric Output of Boxplot (With Plot=False) into Something Usable
Xpath to Extract Text After Br Tags in R
Adding R^2 on Graph with Facets
How to Prevent Objects from Automatically Loading When I Open Rstudio
How to Show Corpus Text in R Tm Package
Plotting Dose Response Curves with Ggplot2 and Drc
R: Serialize Objects to Text File and Back Again
How to Save Interactive Charts from Dygraph
Use a Custom Icon in Plotly's Pie Chart in R
How to Use Aggregate Function in R