How to Implement Linear Interpolation

How to implement linear interpolation?

As I understand your question, you want to write some function y = interpolate(x_values, y_values, x), which will give you the y value at some x? The basic idea then follows these steps:

Find the indices of the values in x_values which define an interval containing x. For instance, for x=3 with your example lists, the containing interval would be [x1,x2]=[2.5,3.4], and the indices would be i1=1, i2=2
Calculate the slope on this interval by (y_values[i2]-y_values[i1])/(x_values[i2]-x_values[i1]) (ie dy/dx).
The value at x is now the value at x1 plus the slope multiplied by the distance from x1.

You will additionally need to decide what happens if x is outside the interval of x_values, either it's an error, or you could interpolate "backwards", assuming the slope is the same as the first/last interval.

Did this help, or did you need more specific advice?

How to implement linear interpolation method in java array?

If you want to interpolate intervals to different count of numbers, you can just add the count of output numbers to function parameter.
Example:

/***
 * Interpolating method
 * @param start start of the interval
 * @param end end of the interval
 * @param count count of output interpolated numbers
 * @return array of interpolated number with specified count
 */
public static double[] interpolate(double start, double end, int count) {
    if (count < 2) {
        throw new IllegalArgumentException("interpolate: illegal count!");
    }
    double[] array = new double[count + 1];
    for (int i = 0; i <= count; ++ i) {
        array[i] = start + i * (end - start) / count;
    }
    return array;
}

Then you can just call interpolate(0, 6, 6); or interpolate(6, 12, 6); or interpolate(6, 12, 12); or whatever you want.

Linear Interpolation. How to implement this algorithm in C ? (Python version is given)

Interpolation in the sense of "signal sample rate increase"

... or i call it, "upsampling" (wrong term, probably. disclaimer: i have not read Lyons'). I just had to understand what the code does and then re-write it for readability. As given it has couple of problems:

a) it is inefficient - two loops is ok but it does multiplication for every single output item; also it uses intermediary lists(hold), generates result with append (small beer)

b) it interpolates wrong the first interval; it generates fake data in front of the first element. Say we have multiplier=5 and seq=[20,30] - it will generate [0,4,8,12,16,20,22,24,28,30] instead of [20,22,24,26,28,30].

So here is the algorithm in form of a generator:

def upsampler(seq, multiplier):
    if seq:
        step = 1.0 / multiplier
        y0 = seq[0];
        yield y0
        for y in seq[1:]:
            dY = (y-y0) * step
            for i in range(multiplier-1):
                y0 += dY;
                yield y0
            y0 = y;
            yield y0

Ok and now for some tests:

>>> list(upsampler([], 3))  # this is just the same as [Y for Y in upsampler([], 3)]
[]
>>> list(upsampler([1], 3))
[1]
>>> list(upsampler([1,2], 3))
[1, 1.3333333333333333, 1.6666666666666665, 2]
>>> from math import sin, pi
>>> seq = [sin(2.0*pi * i/10) for i in range(20)]
>>> seq
[0.0, 0.58778525229247314, 0.95105651629515353, 0.95105651629515364, 0.58778525229247325, 1.2246063538223773e-016, -0.58778525229247303, -0.95105651629515353, -0.95105651629515364, -0.58778525229247336, -2.4492127076447545e-016, 0.58778525229247214, 0.95105651629515353, 0.95105651629515364, 0.58778525229247336, 3.6738190614671318e-016, -0.5877852522924728, -0.95105651629515342, -0.95105651629515375, -0.58778525229247347]
>>> list(upsampler(seq, 2))
[0.0, 0.29389262614623657, 0.58778525229247314, 0.76942088429381328, 0.95105651629515353, 0.95105651629515364, 0.95105651629515364, 0.7694208842938135, 0.58778525229247325, 0.29389262614623668, 1.2246063538223773e-016, -0.29389262614623646, -0.58778525229247303, -0.76942088429381328, -0.95105651629515353, -0.95105651629515364, -0.95105651629515364, -0.7694208842938135, -0.58778525229247336, -0.29389262614623679, -2.4492127076447545e-016, 0.29389262614623596, 0.58778525229247214, 0.76942088429381283, 0.95105651629515353, 0.95105651629515364, 0.95105651629515364, 0.7694208842938135, 0.58778525229247336, 0.29389262614623685, 3.6738190614671318e-016, -0.29389262614623618, -0.5877852522924728, -0.76942088429381306, -0.95105651629515342, -0.95105651629515364, -0.95105651629515375, -0.76942088429381361, -0.58778525229247347]

And here is my translation to C, fit into Kratz's fn template:

/**
 *
 * @param src caller supplied array with data
 * @param src_len len of src
 * @param steps to interpolate
 * @param dst output param will be filled with (src_len - 1) * steps + 1 samples
 */
float* linearInterpolation(float* src, int src_len, int steps, float* dst)
{
    float step, y0, dy;
    float *src_end;
    if (src_len > 0) {
        step = 1.0 / steps;
        for (src_end = src+src_len; *dst++ = y0 = *src++, src < src_end; ) {
            dY = (*src - y0) * step;
            for (int i=steps; i>0; i--) {
                *dst++ = y0 += dY;
            }
        }
    }
}

Please note the C snippet is "typed but never compiled or run", so there might be syntax errors, off-by-1 errors etc. But overall the idea is there.

Floating point linear interpolation

As Jason C points out in the comments, the version you posted is most likely the best choice, due to its superior precision near the edge cases:

float lerp(float a, float b, float f)
{
    return a * (1.0 - f) + (b * f);
}

If we disregard from precision for a while, we can simplify the expression as follows:

a(1 − f) × (b − a)

= a − af + bf

= a + f(b − a)

Which means we could write it like this:

float lerp(float a, float b, float f)
{
    return a + f * (b - a);
}

In this version we've gotten rid of one multiplication, but lost some precision.

Having trouble conceptualising how to implement a linear interpolation in R

Not the most elegant solution, but this uses dplyr and magrittr. First, I define your data frame.

# Data frame called df
#                  Date Irradiance    AirMass0
# 1 2013-01-04 10:43:00    1055.64  0.02798423
# 2 2013-01-12 20:16:00       0.00  0.01952277
# 3 2013-01-12 11:48:00     975.22  0.01946854
# 4 2013-01-08 07:19:00       0.25  0.03845099
# 5 2013-01-04 08:19:00     953.33 -0.14285513
# 6 2013-01-14 10:11:00    1017.62  0.03227589

Next, I load the relevant libraries.

# Load libraries 
library(dplyr)
library(magrittr)

Here, I create a function that takes an air mass and given x value (i.e., 0.9804), creates a reference data frame (i.e., x equal to 0.5, 0.75, and 1 and corresponding y values), then creates a function that will estimate y based on x through linear interpolation.

# Calculate pressure correction based on air mass and x
pres_cor <- function(m, x){
  # Create reference data frame
  ref_df <- data.frame(x_ref = c(0.5 , 0.75 , 1),
                       y_ref = c(1.68219 - 0.03059 * m + 0.000890 * m^2, 1.248274 - 0.011997 * m + 0.000370 * m^2, 1))

  # Create function for interpolation
  int_fun <- with(ref_df, approxfun(x_ref, y_ref))

  # Return value at given x value
  int_fun(x)
}

Finally, I apply this to each row of your data frame using the pipe operator (%>%), rowwise, and mutate from dplyr, and the compound assignment pipe (%<>%) from magrittr.

# Use function for each row
df %<>% 
  rowwise %>% 
  mutate(y = pres_cor(AirMass0, 0.9804))

This gives the following:

# # A tibble: 6 x 4
#   Date                Irradiance AirMass0     y
#    <dttm>                  <dbl>    <dbl> <dbl>
# 1 2013-01-04 10:43:00    1056.     0.0280  1.02
# 2 2013-01-12 20:16:00       0      0.0195  1.02
# 3 2013-01-12 11:48:00     975.     0.0195  1.02
# 4 2013-01-08 07:19:00       0.25   0.0385  1.02
# 5 2013-01-04 08:19:00     953.    -0.143   1.02
# 6 2013-01-14 10:11:00    1018.     0.0323  1.02

Note that y values look the same due to rounding, but are not upon closer inspection.

# df$y
# [1] 1.019438 1.019446 1.019446 1.019429 1.019600 1.019434

How is linear interpolation of data sets usually implemented?

The way I usually implement O(1) interpolation is by means of an additional data structure, which I call IntervalSelector that in time O(1) will give the two surrounding values of the sequence that have to be interpolated.

An IntervalSelector is a class that, when given a sequence of n abscissas builds and remembers a table that will map any given value of x to the index i such that sequence[i] <= x < sequence[i+1] in time O(1).

Note: In what follows arrays are 1 based.

The algorithm that builds the table proceeds as follow:

Find delta to be the minimum distance between two consecutive elements in the input sequence of abscissas.
Set count := (b-a)/delta + 1, where a and b are respectively the first and last of the (ascending) sequence and / stands for the integer quotient of the division.
Define table to be an Array of count elements.
For i between 1 and n set table[(sequence[j]-a)/delta + 1] := j.
Repeat every entry of table visited in 4 to the unvisited positions that come right after it.

On output, table maps j to i if (j-1)*d <= sequence[i] - a < j*d.

Here is an example:

Sample Image

Since elements 3^rd and 4^th are the closest ones, we divide the interval in subintervals of this smallest length. Now, we remember in the table the positions of the left end of each of these deta-intervals. Later on, when an input x is given, we compute the delta-interval of such x as (x-a)/delta + 1 and use the table to deduce the corresponding interval in the sequence. If x falls to the left of the ith sequence element, we choose the (i-1)th.

More precisely:

Given any input x between a and b calculate j := (x-a)/delta + 1 and i := table[j]. If x < sequence[i] put i := i - 1. Then, the index i satisfies sequence[i] <= x < sequence[i+1]; otherwise the distance between these two consecutive elements would be smaller than delta, which is not.

Remark: Be aware that if the minimum distance delta between consecutive elements in sequence is too small the table will have too many entries. The simple description I've presented here ignores these pathological cases, which require additional work.

Pandas linear interpolation for geometrical X-Y data seems to ignore points

Most likely the problem is that the timestamps in the original and resampled DataFrames are not aligned, so when resampling we need to specify how to deal with that.

Since the original is at 50 Hz and the resampled is at 2500 Hz, simply taking mean should fix it:

upsampled = new_df.resample('0.4ms').mean().interpolate(method='linear')

Unfortunately, without having any sample data, I cannot verify that it works. Please let me know if it does help