Interpolate / Extend quarterly to monthly series
I'm not quite clear on your comment about the undesired column format but if you're trying to get the interpolated values using a cubic interpolation, you might consider something like the code below
ger <- data.frame(DATE= as.Date(c("1991-01-01", "1991-04-01", "1991-07-01", "1991-10-01", "1992-01-01" )),
+ VALUE= c(470780, 468834, 466332, 472949, 480359))
DateSeq <- seq(ger$DATE[1],tail(ger$DATE,1),by="1 month")
gerMonthly <- data.frame(DATE=DateSeq, Interp.Value=spline(ger, method="natural", xout=DateSeq)$y)
merge(ger, gerMonthly, by='DATE', all.y = T)
The DATE column needs to be in Date format so the interpolation can work with numeric values.
I've usually used "natural" cubic splines but other options are available.
This format shows both the input values and the results so that you can check that the interpolation looks reasonable but you can use gerMonthly if you just want the interpolated results.
Form a monthly series from a quarterly series
Convert to zoo with "yearmon"
class index assuming the values are at the ends of the quarters. Then perform the rolling mean giving z.mu
. Now merge that with a zero width zoo object containing all the months and use na.spline
to fill in the missing values (or use na.locf
or na.approx
for different forms of interpolation). Optionally use fortify.zoo
to convert back to a data.frame.
library(zoo)
z <- zoo(coredata(DF), as.yearmon(as.yearqtr(rownames(DF)), frac = 1))
z.mu <- rollmeanr(z, 2, partial = TRUE)
ym <- seq(floor(start(z.mu)), floor(end(z.mu)) + 11/12, 1/12)
z.ym <- na.spline(merge(z.mu, zoo(, ym)))
fortify.zoo(z.ym)
giving:
Index Country
1 Jan 1999 -0.065000000
2 Feb 1999 -0.052222222
3 Mar 1999 -0.040555556
4 Apr 1999 -0.030000000
5 May 1999 -0.020555556
6 Jun 1999 -0.012222222
7 Jul 1999 -0.005000000
8 Aug 1999 0.001111111
9 Sep 1999 0.006111111
10 Oct 1999 0.010000000
11 Nov 1999 0.012777778
12 Dec 1999 0.014444444
13 Jan 2000 0.015000000
14 Feb 2000 0.014444444
15 Mar 2000 0.012777778
16 Apr 2000 0.010000000
17 May 2000 0.006111111
18 Jun 2000 0.001111111
19 Jul 2000 -0.005000000
20 Aug 2000 -0.012222222
21 Sep 2000 -0.020555556
22 Oct 2000 -0.030000000
23 Nov 2000 -0.040555556
24 Dec 2000 -0.052222222
Note: The input DF
in reproducible form used is:
Lines <- " Country
1999Q3 0.01
1999Q4 0.01
2000Q1 0.02
2000Q2 0.00
2000Q3 -0.01"
DF <- read.table(text = Lines)
Update: Originally question asked to move last value forward but was changed to ask for spline interpolation so answer has been changed accordingly. Also changed to start in Jan and end in Dec and now assume data is for quarter end.
interpolating in R yearly time series data with quarterly values
Here's an example using dplyr
:
library(dplyr)
annual_data <- data.frame(
person=c(1, 1, 1, 2, 2),
year=c(2010, 2011, 2012, 2010, 2012),
y=c(1, 2, 3, 1, 3)
)
expand_data <- function(x) {
years <- min(x$year):max(x$year)
quarters <- 1:4
grid <- expand.grid(quarter=quarters, year=years)
x$quarter <- 1
merged <- grid %>% left_join(x, by=c('year', 'quarter'))
merged$person <- x$person[1]
return(merged)
}
interpolate_data <- function(data) {
xout <- 1:nrow(data)
y <- data$y
interpolation <- approx(x=xout[!is.na(y)], y=y[!is.na(y)], xout=xout)
data$yhat <- interpolation$y
return(data)
}
expand_and_interpolate <- function(x) interpolate_data(expand_data(x))
quarterly_data <- annual_data %>% group_by(person) %>% do(expand_and_interpolate(.))
print(as.data.frame(quarterly_data))
The output from this approach is:
quarter year person y yhat
1 1 2010 1 1 1.00
2 2 2010 1 NA 1.25
3 3 2010 1 NA 1.50
4 4 2010 1 NA 1.75
5 1 2011 1 2 2.00
6 2 2011 1 NA 2.25
7 3 2011 1 NA 2.50
8 4 2011 1 NA 2.75
9 1 2012 1 3 3.00
10 2 2012 1 NA NA
11 3 2012 1 NA NA
12 4 2012 1 NA NA
13 1 2010 2 1 1.00
14 2 2010 2 NA 1.25
15 3 2010 2 NA 1.50
16 4 2010 2 NA 1.75
17 1 2011 2 NA 2.00
18 2 2011 2 NA 2.25
19 3 2011 2 NA 2.50
20 4 2011 2 NA 2.75
21 1 2012 2 3 3.00
22 2 2012 2 NA NA
23 3 2012 2 NA NA
24 4 2012 2 NA NA
There are probably a bunch of ways to clean this up. The key functions being used are expand.grid
, approx
, and dplyr::group_by
. The approx
function is a little tricky. Looking at the implementation of zoo::na.approx.default
was quite helpful in figuring out how to work with approx
.
Going from monthly average dataframe to an interpolated daily timeseries
Here is one way to do it:
import pandas as pd
import numpy as np
# monthly averages, note these should be cast to float
month = np.array(['1.527013956', '1.899169054', '1.669356146',
'1.44920871', '1.188557788', '1.017035727',
'0.950243755', '1.022453993', '1.203913739',
'1.369545041', '1.441827406', '1.48621651'], dtype='float')
# expand this to 51 years, with the same monthly averages repeating each year
# (obviously not very efficient, probably there are better ways to attack the problem,
# but this was the question)
month = np.tile(month, 51)
# create DataFrame with these values
m_avg = pd.DataFrame({'Month': month})
# set the date index to the desired time period
m_avg.index = pd.date_range(start='1/1/1950', end='12/1/2000', freq='MS')
# shift the index by 14 days to get the 15th of each month
m_avg = m_avg.tshift(14, freq='D')
# expand the index to daily frequency
daily = m_avg.asfreq(freq='D')
# interpolate (linearly) the missing values
daily = daily.interpolate()
# show result
display(daily)
Output:
Month
1950-01-15 1.527014
1950-01-16 1.539019
1950-01-17 1.551024
1950-01-18 1.563029
1950-01-19 1.575034
... ...
2000-12-11 1.480298
2000-12-12 1.481778
2000-12-13 1.483257
2000-12-14 1.484737
2000-12-15 1.486217
18598 rows × 1 columns
Pandas: Convert Quarterly Data into Monthly Data
You can use resample
:
# convert to period
df['Date'] = pd.to_datetime(df['Date']).dt.to_period('M')
# set Date as index and resample
df.set_index('Date').resample('M').interpolate()
Output:
Value
Date
2010-01 100.0
2010-02 110.0
2010-03 120.0
2010-04 130.0
2010-05 140.0
2010-06 150.0
2010-07 160.0
Related Topics
Caching the Mean of a Vector in R
Number Formatting Axis Labels in Ggplot2
Ggplot2 Increase Space Between Legend Keys
What's the Difference in Using a Semicolon or Explicit New Line in R Code
Check If R Is Running in Rstudio
How to Drop Columns by Passing Variable Name with Dplyr
How to Filter a Table's Row Based on an External Vector
How to Set Na.Rm to True Globally
Meaning of Objects Being Masked by the Global Environment
Trying to Use Dplyr to Group_By and Apply Scale()
R Reshape a Vector into Multiple Columns
Using a Static (Prebuilt) PDF Vignette in R Package
What Does the Double Percentage Sign (%%) Mean
Merge Data Frames and Overwrite Values