Best Way to Interpolate Values in SQL

Best way to interpolate values in SQL

Something like this (corrected):

SELECT CASE WHEN next.Date IS NULL  THEN prev.Rate
            WHEN prev.Date IS NULL  THEN next.Rate
            WHEN next.Date = prev.Date  THEN prev.Rate
              ELSE ( DATEDIFF(d, prev.Date, @InputDate) * next.Rate 
                   + DATEDIFF(d, @InputDate, next.Date) * prev.Rate
                   ) / DATEDIFF(d, prev.Date, next.Date)
       END AS interpolationRate 
FROM
  ( SELECT TOP 1 
        Date, Rate 
    FROM Rates
    WHERE Date <= @InputDate
    ORDER BY Date DESC
  ) AS prev
  CROSS JOIN
  ( SELECT TOP 1 
        Date, Rate 
    FROM Rates
    WHERE Date >= @InputDate
    ORDER BY Date ASC
  ) AS next

SQL query to interpolate between values

This can probably be simplified a bit but gets the answer you wanted, I believe. The slightly tricky bit is getting both the number of days between not-null values (i.e. the size of the gap you're filling) and then the position within that gap:

-- CTE for sample data
with your_table (emp, test_date, value) as (
            select 'A', date '2001-01-01', null from dual
  union all select 'A', date '2001-01-02', 100 from dual
  union all select 'A', date '2001-01-03', null from dual
  union all select 'A', date '2001-01-04', 80 from dual
  union all select 'A', date '2001-01-05', null from dual
  union all select 'A', date '2001-01-06', null from dual
  union all select 'A', date '2001-01-07', 75 from dual
)
-- actual query
select emp, test_date, value,
  coalesce(value,
    (next_value - prev_value) -- v3-v1
    / (count(*) over (partition by grp) + 1) -- d3-d1
    * row_number() over (partition by grp order by test_date desc) -- d2-d1, indirectly
    + prev_value -- v1
  ) as interpolated
from (
  select emp, test_date, value,
    last_value(value ignore nulls)
      over (partition by emp order by test_date) as prev_value,
    first_value(value ignore nulls)
      over (partition by emp order by test_date range between current row and unbounded following) as next_value,
    row_number() over (partition by emp order by test_date) -
      row_number() over (partition by emp order by case when value is null then 1 else 0 end, test_date) as grp
  from your_table
)
order by test_date;

E TEST_DATE       VALUE INTERPOLATED
- ---------- ---------- ------------
A 2001-01-01                        
A 2001-01-02        100          100
A 2001-01-03                      90
A 2001-01-04         80           80
A 2001-01-05              76.6666667
A 2001-01-06              78.3333333
A 2001-01-07         75           75

I've used last_value and first_value instead of lead and lag, but either works. (Lead/lag might be faster on a large data set I suppose). The grp calculation is Tabibitosan.

How to get interpolation value in SQL Server?

If I assume that you mean linear interpolation between the previous price and the next price based on the number of days that passed, then you can use the following method:

Use window functions to get the next and previous days with prices for each row.
Use window functions or joins to get the prices on those days as well.
Use arithmetic to calculate the linear interpolation.

You SQL Fiddle uses SQL Server, so I assume that is the database you are using. The code looks like this:

select t.*,
       coalesce(t.price, 
                (tprev.price +
                 (tnext.price - tprev.price) / datediff(day, prev_price_day, next_price_day) *
                 datediff(day, t.prev_price_day, t.dt_day)
                )
               ) as imputed_price
from (select t.*,
             max(case when price is not null then dt_day end) over (partition by cat01, cat02 order by dt_day asc) as prev_price_day,
             min(case when price is not null then dt_day end) over (partition by cat01, cat02 order by dt_day desc) as next_price_day
      from temp01 t
     ) t left join 
     temp01 tprev
     on tprev.cat01 = t.cat01 and
        tprev.cat02 = t.cat02 and
        tprev.dt_day = t.prev_price_day left join
     temp01 tnext
     on tnext.cat01 = t.cat01 and
        tnext.cat02 = t.cat02 and
        tnext.dt_day = t.next_price_day 
order by cat01, cat02, dt_day;

Here is a db<>fiddle.

SQL query with linear interpolation and Group By

This might be a good place for lateral joins:

select d.dt, 
    case 
        when n.date = p.date then p.value
        else p.value + (n.value - p.value) / datediff('day', n.date, p.date)
    end as new_value
from (select date '2020-04-01') d(date)
cross join lateral (
    select t.* from mytable t where t.date <= d.date order by t.date desc limit 1
) p  -- "previous" value
cross join lateral (
    select t.* from mytable t where t.date >= d.date order by t.date limit 1
) n  -- "next" value

We can write the query without lateral joins:

select date '2020-04-01' as dt, p.k,
    case 
        when n.date = p.date then p.value
        else p.value + (n.value - p.value) / datediff('day', n.date, p.date)
    end as new_value
from (
    select t.*, 
        row_number() over(partition by k order by date desc) as rn
    from mytable t
    where date <= '2020-04-01'
) p
inner join (
    select t.*, 
        row_number() over(partition by k order by date) as rn
    from mytable t
    where date >= '2020-04-01'
) n on n.k = p.k
where p.rn = 1 and n.rn = 1

This also generalizes the query so it can process multiple keys at once (key is language keyword, I used k instead).

SQL Interpolated Strings

Giving the credits to @j.f.sebastian for pointing out these solutions. Sadly xp_sprintf is limited to 254 characters, so that wouldn't be ideal when using long queries. FORMATMESSAGE instead is limited to 2047 characters, so that's good enough in order to run long queries.

I'll summarize everything in one post for the solutions in order to keep things organized.

Answer 1:

Using FORMATMESSAGE it's important to know, that using interpolated string as first the parameter, Its supported only SQL versions 2012 and above, so I'll post 2 answers with FORMATMESSAGE:

SQL Version >= 2012:

SET @query = FORMATMESSAGE('SELECT %s FROM SOME_TABLE', @somevariable);

SQL Version < 2012:

EXEC sp_addmessage 50001, 16, 'SELECT %s FROM SOME_TABLE', NULL, NULL, 'replace'
SET @query = FORMATMESSAGE(50001, @somevariable)

Answer 2:

Using xp_sprintf stored procedure is important to note that It's limited to 254 characters, so it won't be a good idea for long queries.

DECLARE  @query AS VARCHAR(100)
        ,@somevariable as VARCHAR(10) = '[id]'
EXEC xp_sprintf @query OUTPUT, 'SELECT %s FROM SOME_TABLE', @somevariable

How to interpolate Time series data using Linear interpolation on big datasets in Presto?

Basically, you can use lag(ignore nulls)/lead(ignore nulls) and some arithmetic for interpolation:

select t.*,
       coalesce(t.pressure,
                (time_ms - prev_time_ms) * (next_pressure - prev_pressure) / (next_time_ms - prev_time_ms)
               ) as imputed_pressure
from (select t.*,
             to_milliseconds(time) as time_ms
             lag(pressure ignore nulls) over (order by time) as prev_pressure,
             lag(to_milliseconds(time)  ignore nulls) over (order by time) as prev_time_ms,
             lag(pressure ignore nulls) over (order by time) as next_pressure,
             lag(to_milliseconds(time)  ignore nulls) over (order by time) as next_time_ms
     from t
    ) t

SQL Server Interpolate Missing rows

declare @MaxDate date
declare @MinDate date

select @MaxDate = MAX([Date]),
        @MinDate = MIN([Date])
from Dates

declare @MaxValue int
declare @MinValue int

select @MaxValue = [Value] from Dates where [Date] = @MaxDate
select @MinValue = [Value] from Dates where [Date] = @MinDate

declare @diff int
select @diff = DATEDIFF(d, @MinDate, @MaxDate)

declare @increment int
set @increment = (@MaxValue - @MinValue)  / @diff

select @increment

declare @jaggedDates as table
(
    PID INT IDENTITY(1,1) PRIMARY KEY,
    ThisDate date,
    ThisValue int
)

declare @finalDates as table
(
    PID INT IDENTITY(1,1) PRIMARY KEY,
    [Date] date,
    Value int
)

declare @thisDate date
declare @thisValue int
declare @nextDate date
declare @nextValue int

declare @count int
insert @jaggedDates select [Date], [Value] from Dates
select @count = @@ROWCOUNT

declare @thisId int 
set @thisId = 1
declare @entryDiff int
declare @missingDate date
declare @missingValue int

while @thisId <= @count
begin
    select @thisDate = ThisDate,
            @thisValue = ThisValue
    from @jaggedDates
    where PID = @thisId

    insert @finalDates values (@thisDate, @thisValue)

    if @thisId < @count
    begin
        select @nextDate = ThisDate,
            @nextValue = ThisValue
        from @jaggedDates
        where PID = @thisId + 1

        select @entryDiff = DATEDIFF(d, @thisDate, @nextDate)
        if  @entryDiff > 1
        begin
            set @missingDate = @thisDate
            set @missingValue = @thisValue
            while @entryDiff > 1
            begin
                set @missingDate = DATEADD(d, 1, @missingDate)
                set @missingValue = @missingValue + @increment
                insert @finalDates values (@missingDate, @missingValue)
                set @entryDiff = @entryDiff - 1
            end
        end
    end

    set @thisId = @thisId + 1
end

select * from @finalDates

Interpolate Multiseries Data In SQL

With the help of a query that gets you all the combinations of data_type_id and the 5-second moments you need, you can achieve the result you need using a subquery that gets you the closest data_value:

with recursive u as
(select '2022-01-19 17:20:42' as d
union all
select DATE_ADD(d, interval 5 second) from u
where d < '2022-01-19 17:20:52'),
v as
(select * from u cross join (select distinct data_type_id from table_name) t)
select v.data_type_id, 
(select data_value from table_name where inserted_at <= d and data_type_id = v.data_type_id
order by inserted_at desc limit 1) as data_value, 
d as inserted_at
from v

Fiddle

You can replace the recursive CTE with any query that gets you all the 5-second moments you need.

Best Way to Interpolate Values in SQL