Interpolate/Extend Quarterly to Monthly Series

Interpolate / Extend quarterly to monthly series

I'm not quite clear on your comment about the undesired column format but if you're trying to get the interpolated values using a cubic interpolation, you might consider something like the code below

ger <- data.frame(DATE= as.Date(c("1991-01-01", "1991-04-01", "1991-07-01", "1991-10-01", "1992-01-01" )),
+ VALUE= c(470780, 468834, 466332, 472949, 480359))
DateSeq <- seq(ger$DATE[1],tail(ger$DATE,1),by="1 month")
gerMonthly <- data.frame(DATE=DateSeq, Interp.Value=spline(ger, method="natural", xout=DateSeq)$y)
merge(ger, gerMonthly, by='DATE', all.y = T)

The DATE column needs to be in Date format so the interpolation can work with numeric values.
I've usually used "natural" cubic splines but other options are available.
This format shows both the input values and the results so that you can check that the interpolation looks reasonable but you can use gerMonthly if you just want the interpolated results.

Form a monthly series from a quarterly series

Convert to zoo with "yearmon" class index assuming the values are at the ends of the quarters. Then perform the rolling mean giving z.mu. Now merge that with a zero width zoo object containing all the months and use na.spline to fill in the missing values (or use na.locf or na.approx for different forms of interpolation). Optionally use fortify.zoo to convert back to a data.frame.

library(zoo)

z <- zoo(coredata(DF), as.yearmon(as.yearqtr(rownames(DF)), frac = 1))
z.mu <- rollmeanr(z, 2, partial = TRUE)
ym <- seq(floor(start(z.mu)), floor(end(z.mu)) + 11/12, 1/12)
z.ym <- na.spline(merge(z.mu, zoo(, ym)))

fortify.zoo(z.ym)

giving:

      Index      Country
1 Jan 1999 -0.065000000
2 Feb 1999 -0.052222222
3 Mar 1999 -0.040555556
4 Apr 1999 -0.030000000
5 May 1999 -0.020555556
6 Jun 1999 -0.012222222
7 Jul 1999 -0.005000000
8 Aug 1999 0.001111111
9 Sep 1999 0.006111111
10 Oct 1999 0.010000000
11 Nov 1999 0.012777778
12 Dec 1999 0.014444444
13 Jan 2000 0.015000000
14 Feb 2000 0.014444444
15 Mar 2000 0.012777778
16 Apr 2000 0.010000000
17 May 2000 0.006111111
18 Jun 2000 0.001111111
19 Jul 2000 -0.005000000
20 Aug 2000 -0.012222222
21 Sep 2000 -0.020555556
22 Oct 2000 -0.030000000
23 Nov 2000 -0.040555556
24 Dec 2000 -0.052222222

Note: The input DF in reproducible form used is:

Lines <- "         Country
1999Q3 0.01
1999Q4 0.01
2000Q1 0.02
2000Q2 0.00
2000Q3 -0.01"

DF <- read.table(text = Lines)

Update: Originally question asked to move last value forward but was changed to ask for spline interpolation so answer has been changed accordingly. Also changed to start in Jan and end in Dec and now assume data is for quarter end.

interpolating in R yearly time series data with quarterly values

Here's an example using dplyr:

library(dplyr)

annual_data <- data.frame(
person=c(1, 1, 1, 2, 2),
year=c(2010, 2011, 2012, 2010, 2012),
y=c(1, 2, 3, 1, 3)
)

expand_data <- function(x) {
years <- min(x$year):max(x$year)
quarters <- 1:4
grid <- expand.grid(quarter=quarters, year=years)
x$quarter <- 1
merged <- grid %>% left_join(x, by=c('year', 'quarter'))
merged$person <- x$person[1]
return(merged)
}

interpolate_data <- function(data) {
xout <- 1:nrow(data)
y <- data$y
interpolation <- approx(x=xout[!is.na(y)], y=y[!is.na(y)], xout=xout)
data$yhat <- interpolation$y
return(data)
}

expand_and_interpolate <- function(x) interpolate_data(expand_data(x))

quarterly_data <- annual_data %>% group_by(person) %>% do(expand_and_interpolate(.))

print(as.data.frame(quarterly_data))

The output from this approach is:

   quarter year person  y yhat
1 1 2010 1 1 1.00
2 2 2010 1 NA 1.25
3 3 2010 1 NA 1.50
4 4 2010 1 NA 1.75
5 1 2011 1 2 2.00
6 2 2011 1 NA 2.25
7 3 2011 1 NA 2.50
8 4 2011 1 NA 2.75
9 1 2012 1 3 3.00
10 2 2012 1 NA NA
11 3 2012 1 NA NA
12 4 2012 1 NA NA
13 1 2010 2 1 1.00
14 2 2010 2 NA 1.25
15 3 2010 2 NA 1.50
16 4 2010 2 NA 1.75
17 1 2011 2 NA 2.00
18 2 2011 2 NA 2.25
19 3 2011 2 NA 2.50
20 4 2011 2 NA 2.75
21 1 2012 2 3 3.00
22 2 2012 2 NA NA
23 3 2012 2 NA NA
24 4 2012 2 NA NA

There are probably a bunch of ways to clean this up. The key functions being used are expand.grid, approx, and dplyr::group_by. The approx function is a little tricky. Looking at the implementation of zoo::na.approx.default was quite helpful in figuring out how to work with approx.

Going from monthly average dataframe to an interpolated daily timeseries

Here is one way to do it:

import pandas as pd
import numpy as np

# monthly averages, note these should be cast to float
month = np.array(['1.527013956', '1.899169054', '1.669356146',
'1.44920871', '1.188557788', '1.017035727',
'0.950243755', '1.022453993', '1.203913739',
'1.369545041', '1.441827406', '1.48621651'], dtype='float')

# expand this to 51 years, with the same monthly averages repeating each year
# (obviously not very efficient, probably there are better ways to attack the problem,
# but this was the question)
month = np.tile(month, 51)

# create DataFrame with these values
m_avg = pd.DataFrame({'Month': month})

# set the date index to the desired time period
m_avg.index = pd.date_range(start='1/1/1950', end='12/1/2000', freq='MS')

# shift the index by 14 days to get the 15th of each month
m_avg = m_avg.tshift(14, freq='D')

# expand the index to daily frequency
daily = m_avg.asfreq(freq='D')

# interpolate (linearly) the missing values
daily = daily.interpolate()

# show result
display(daily)

Output:

            Month
1950-01-15 1.527014
1950-01-16 1.539019
1950-01-17 1.551024
1950-01-18 1.563029
1950-01-19 1.575034
... ...
2000-12-11 1.480298
2000-12-12 1.481778
2000-12-13 1.483257
2000-12-14 1.484737
2000-12-15 1.486217

18598 rows × 1 columns

Pandas: Convert Quarterly Data into Monthly Data

You can use resample:

# convert to period
df['Date'] = pd.to_datetime(df['Date']).dt.to_period('M')

# set Date as index and resample
df.set_index('Date').resample('M').interpolate()

Output:

         Value
Date
2010-01 100.0
2010-02 110.0
2010-03 120.0
2010-04 130.0
2010-05 140.0
2010-06 150.0
2010-07 160.0


Related Topics



Leave a reply



Submit