How to Make Scipy.Interpolate Give an Extrapolated Result Beyond the Input Range

How to make scipy.interpolate give an extrapolated result beyond the input range?

1. Constant extrapolation

You can use interp function from scipy, it extrapolates left and right values as constant beyond the range:

>>> from scipy import interp, arange, exp
>>> x = arange(0,10)
>>> y = exp(-x/3.0)
>>> interp([9,10], x, y)
array([ 0.04978707, 0.04978707])

2. Linear (or other custom) extrapolation

You can write a wrapper around an interpolation function which takes care of linear extrapolation. For example:

from scipy.interpolate import interp1d
from scipy import arange, array, exp

def extrap1d(interpolator):
xs = interpolator.x
ys = interpolator.y

def pointwise(x):
if x < xs[0]:
return ys[0]+(x-xs[0])*(ys[1]-ys[0])/(xs[1]-xs[0])
elif x > xs[-1]:
return ys[-1]+(x-xs[-1])*(ys[-1]-ys[-2])/(xs[-1]-xs[-2])
else:
return interpolator(x)

def ufunclike(xs):
return array(list(map(pointwise, array(xs))))

return ufunclike

extrap1d takes an interpolation function and returns a function which can also extrapolate. And you can use it like this:

x = arange(0,10)
y = exp(-x/3.0)
f_i = interp1d(x, y)
f_x = extrap1d(f_i)

print f_x([9,10])

Output:

[ 0.04978707  0.03009069]

extrapolating data with numpy/python

After discussing with you in the Python chat - you're fitting your data to an exponential. This should give a relatively good indicator since you're not looking for long term extrapolation.

import numpy as np
from scipy.optimize import curve_fit
import matplotlib.pyplot as plt

def exponential_fit(x, a, b, c):
return a*np.exp(-b*x) + c

if __name__ == "__main__":
x = np.array([0, 1, 2, 3, 4, 5])
y = np.array([30, 50, 80, 160, 300, 580])
fitting_parameters, covariance = curve_fit(exponential_fit, x, y)
a, b, c = fitting_parameters

next_x = 6
next_y = exponential_fit(next_x, a, b, c)

plt.plot(y)
plt.plot(np.append(y, next_y), 'ro')
plt.show()

The red dot in the on far right axis shows the next "predicted" point.

Interpolation out of the range in Python

For these kind of questions, I generally recommend to first look at the documentation of the method / class / function / etc. that you are using. In this case, the documentation for scipy.interpolate.interp1d tells us all we need to know:

bounds_error: bool, optional

If True, a ValueError is raised any time interpolation is attempted on a value outside of the range of x (where extrapolation is necessary). If False, out of bounds values are assigned fill_value. By default, an error is raised unless fill_value="extrapolate".

and:

fill_value: array-like or (array-like, array_like) or “extrapolate”, optional

  • if a ndarray (or float), this value will be used to fill in for requested points outside of the data range. If not provided, then the default is NaN. The array-like must broadcast properly to the dimensions of the non-interpolation axes.

Since you want values outside the value range to default to 0, this is all you need:

from scipy import interpolate

testList = [
[
[(0.0, -0.9960135495794032), (0.5, -1.0)],
[(0.5, -1.0), (2.0, -0.16138322487766676), (2.5, 1.0849272141417852)]
],
[
[(4.0, 3.3149805356833015), (4.5, 0.1649293864654484), (5.0, -1.0)],
[(5.0, -1.0), (5.5, 0.33841349597101744), (6.0, 4.702347949145297)]
],
]

results = []
for subset in testList:
sub_result = []
for dataset in subset:
x = [coord[0] for coord in dataset]
y = [coord[1] for coord in dataset]
# Enter y to find x: f(y) = x
f = interpolate.interp1d(y, x, bounds_error=False, fill_value=0)
interpolate_result = (f(0), f(-1))
sub_result.append(interpolate_result)
results.append(sub_result)

print(results)

If the documentation doesn't tell you what you are looking for, simply searching the internet for the error code can produce helpful results, because chances are, somebody else has already inquired about a similar problem. In your case, when searching for ValueError: A value in x_new is above the interpolation range., this is the first result. Searching for the error code usually means much less effort (for everybody involved), than asking the question yourself.

scipy interp1d extrapolation method fill_value = tuple not working

According to the documentation, interp1d defaults to raising ValueError on extrapolation except when fill_value='extrapolate' or when you specify bounds_error=False.

In [1]: f1 = interp1d(t, x, kind='linear', fill_value=(0.5, 0.6), bounds_error=False)

In [2]: f1(0)
Out[2]: array(0.5)

Extrapolate with LinearNDInterpolator

I propose a method, the code is awful but I hope it will help you. The idea is, if you know by advance the bounds in which you will have to extrapolate, you can add extra columns/rows at the edge of your arrays with linearly extrapolated values, and then interpolate on the new array. Here is an example with some data that will be extrapolated until x=+-50 and y=+-40:

import numpy as np
x,y=np.meshgrid(np.linspace(0,6,7),np.linspace(0,8,9)) # create x,y grid
z=x**2*y # and z values
# create larger versions with two more columns/rows
xlarge=np.zeros((x.shape[0]+2,x.shape[1]+2))
ylarge=np.zeros((x.shape[0]+2,x.shape[1]+2))
zlarge=np.zeros((x.shape[0]+2,x.shape[1]+2))
xlarge[1:-1,1:-1]=x # copy data on centre
ylarge[1:-1,1:-1]=y
zlarge[1:-1,1:-1]=z
# fill extra columns/rows
xmin,xmax=-50,50
ymin,ymax=-40,40
xlarge[:,0]=xmin;xlarge[:,-1]=xmax # fill first/last column
xlarge[0,:]=xlarge[1,:];xlarge[-1,:]=xlarge[-2,:] # copy first/last row
ylarge[0,:]=ymin;ylarge[-1,:]=ymax
ylarge[:,0]=ylarge[:,1];ylarge[:,-1]=ylarge[:,-2]
# for speed gain: store factor of first/last column/row
first_column_factor=(xlarge[:,0]-xlarge[:,1])/(xlarge[:,1]-xlarge[:,2])
last_column_factor=(xlarge[:,-1]-xlarge[:,-2])/(xlarge[:,-2]-xlarge[:,-3])
first_row_factor=(ylarge[0,:]-ylarge[1,:])/(ylarge[1,:]-ylarge[2,:])
last_row_factor=(ylarge[-1,:]-ylarge[-2,:])/(ylarge[-2,:]-ylarge[-3,:])
# extrapolate z; this operation only needs to be repeated when zlarge[1:-1,1:-1] is updated
zlarge[:,0]=zlarge[:,1]+first_column_factor*(zlarge[:,1]-zlarge[:,2]) # extrapolate first column
zlarge[:,-1]=zlarge[:,-2]+last_column_factor*(zlarge[:,-2]-zlarge[:,-3]) # extrapolate last column
zlarge[0,:]=zlarge[1,:]+first_row_factor*(zlarge[1,:]-zlarge[2,:]) # extrapolate first row
zlarge[-1,:]=zlarge[-2,:]+last_row_factor*(zlarge[-2,:]-zlarge[-3,:]) #extrapolate last row

Then you can interpolate on (xlarge,ylarge,zlarge). Since all operations are numpy slices operations, I hope it will be fast enough for you. When z data are updated, copy them in zlarge[1:-1,1:-1] and re-execute the 4 last lines.

Is there easy way in python to extrapolate data points to the future?

It's all too easy for extrapolation to generate garbage; try this.
Many different extrapolations are of course possible;
some produce obvious garbage, some non-obvious garbage, many are ill-defined.

alt text

""" extrapolate y,m,d data with scipy UnivariateSpline """
import numpy as np
from scipy.interpolate import UnivariateSpline
# pydoc scipy.interpolate.UnivariateSpline -- fitpack, unclear
from datetime import date
from pylab import * # ipython -pylab

__version__ = "denis 23oct"

def daynumber( y,m,d ):
""" 2005,1,1 -> 0 2006,1,1 -> 365 ... """
return date( y,m,d ).toordinal() - date( 2005,1,1 ).toordinal()

days, values = np.array([
(daynumber(2005,1,1), 1.2 ),
(daynumber(2005,4,1), 1.8 ),
(daynumber(2005,9,1), 5.3 ),
(daynumber(2005,10,1), 5.3 )
]).T
dayswanted = np.array([ daynumber( year, month, 1 )
for year in range( 2005, 2006+1 )
for month in range( 1, 12+1 )])

np.set_printoptions( 1 ) # .1f
print "days:", days
print "values:", values
print "dayswanted:", dayswanted

title( "extrapolation with scipy.interpolate.UnivariateSpline" )
plot( days, values, "o" )
for k in (1,2,3): # line parabola cubicspline
extrapolator = UnivariateSpline( days, values, k=k )
y = extrapolator( dayswanted )
label = "k=%d" % k
print label, y
plot( dayswanted, y, label=label ) # pylab

legend( loc="lower left" )
grid(True)
savefig( "extrapolate-UnivariateSpline.png", dpi=50 )
show()

Added: a Scipy ticket says,
"The behavior of the FITPACK classes in
scipy.interpolate is much more complex than the docs would lead one to believe" --
imho true of other software doc too.

How to interpolate/extrapolate within partly empty regular grid?

Since scipy.interp2d doesn't deal with Nans, the solution is to fill the NaNs in the DataFrame before using interp2d. This can be done by using pandas.interpolate function.

In the previous example, the following provide the desired output:

In [1]: from scipy.interpolate import interp2d

In [2]: df = df.interpolate(limit_direction='both',axis=1,inplace=True)
In [3]: myInterp = interp2d(df.index,df.columns,df.values.T)

In [4]: myInterp(1.5,2.5)
Out[4]: array([5.])

In [5]: myInterp(1.5,4.0)
Out[5]: array([3.])

In [6]: myInterp(0.0,2.0)
Out[6]: array([1.5])

In [7]: myInterp(5.0,2.5)
Out[7]: array([2.])

Extrapolating with a single data point

scipy.interpolate.interp1d allows extrapolation.

import numpy as np
from scipy import interpolate

x = np.arange(1,8,1)
y = np.array((10,20,30,40,50,60,70))
interpolate.interp1d(x, y, fill_value='extrapolate')

hope this answers your question



Related Topics



Leave a reply



Submit