Best Way to Interpolate Values in SQL

Best way to interpolate values in SQL

Something like this (corrected):

SELECT CASE WHEN next.Date IS NULL  THEN prev.Rate
WHEN prev.Date IS NULL THEN next.Rate
WHEN next.Date = prev.Date THEN prev.Rate
ELSE ( DATEDIFF(d, prev.Date, @InputDate) * next.Rate
+ DATEDIFF(d, @InputDate, next.Date) * prev.Rate
) / DATEDIFF(d, prev.Date, next.Date)
END AS interpolationRate
FROM
( SELECT TOP 1
Date, Rate
FROM Rates
WHERE Date <= @InputDate
ORDER BY Date DESC
) AS prev
CROSS JOIN
( SELECT TOP 1
Date, Rate
FROM Rates
WHERE Date >= @InputDate
ORDER BY Date ASC
) AS next

SQL query to interpolate between values

This can probably be simplified a bit but gets the answer you wanted, I believe. The slightly tricky bit is getting both the number of days between not-null values (i.e. the size of the gap you're filling) and then the position within that gap:

-- CTE for sample data
with your_table (emp, test_date, value) as (
select 'A', date '2001-01-01', null from dual
union all select 'A', date '2001-01-02', 100 from dual
union all select 'A', date '2001-01-03', null from dual
union all select 'A', date '2001-01-04', 80 from dual
union all select 'A', date '2001-01-05', null from dual
union all select 'A', date '2001-01-06', null from dual
union all select 'A', date '2001-01-07', 75 from dual
)
-- actual query
select emp, test_date, value,
coalesce(value,
(next_value - prev_value) -- v3-v1
/ (count(*) over (partition by grp) + 1) -- d3-d1
* row_number() over (partition by grp order by test_date desc) -- d2-d1, indirectly
+ prev_value -- v1
) as interpolated
from (
select emp, test_date, value,
last_value(value ignore nulls)
over (partition by emp order by test_date) as prev_value,
first_value(value ignore nulls)
over (partition by emp order by test_date range between current row and unbounded following) as next_value,
row_number() over (partition by emp order by test_date) -
row_number() over (partition by emp order by case when value is null then 1 else 0 end, test_date) as grp
from your_table
)
order by test_date;
E TEST_DATE       VALUE INTERPOLATED
- ---------- ---------- ------------
A 2001-01-01
A 2001-01-02 100 100
A 2001-01-03 90
A 2001-01-04 80 80
A 2001-01-05 76.6666667
A 2001-01-06 78.3333333
A 2001-01-07 75 75

I've used last_value and first_value instead of lead and lag, but either works. (Lead/lag might be faster on a large data set I suppose). The grp calculation is Tabibitosan.

How to get interpolation value in SQL Server?

If I assume that you mean linear interpolation between the previous price and the next price based on the number of days that passed, then you can use the following method:

  • Use window functions to get the next and previous days with prices for each row.
  • Use window functions or joins to get the prices on those days as well.
  • Use arithmetic to calculate the linear interpolation.

You SQL Fiddle uses SQL Server, so I assume that is the database you are using. The code looks like this:

select t.*,
coalesce(t.price,
(tprev.price +
(tnext.price - tprev.price) / datediff(day, prev_price_day, next_price_day) *
datediff(day, t.prev_price_day, t.dt_day)
)
) as imputed_price
from (select t.*,
max(case when price is not null then dt_day end) over (partition by cat01, cat02 order by dt_day asc) as prev_price_day,
min(case when price is not null then dt_day end) over (partition by cat01, cat02 order by dt_day desc) as next_price_day
from temp01 t
) t left join
temp01 tprev
on tprev.cat01 = t.cat01 and
tprev.cat02 = t.cat02 and
tprev.dt_day = t.prev_price_day left join
temp01 tnext
on tnext.cat01 = t.cat01 and
tnext.cat02 = t.cat02 and
tnext.dt_day = t.next_price_day
order by cat01, cat02, dt_day;

Here is a db<>fiddle.

SQL query with linear interpolation and Group By

This might be a good place for lateral joins:

select d.dt, 
case
when n.date = p.date then p.value
else p.value + (n.value - p.value) / datediff('day', n.date, p.date)
end as new_value
from (select date '2020-04-01') d(date)
cross join lateral (
select t.* from mytable t where t.date <= d.date order by t.date desc limit 1
) p -- "previous" value
cross join lateral (
select t.* from mytable t where t.date >= d.date order by t.date limit 1
) n -- "next" value

We can write the query without lateral joins:

select date '2020-04-01' as dt, p.k,
case
when n.date = p.date then p.value
else p.value + (n.value - p.value) / datediff('day', n.date, p.date)
end as new_value
from (
select t.*,
row_number() over(partition by k order by date desc) as rn
from mytable t
where date <= '2020-04-01'
) p
inner join (
select t.*,
row_number() over(partition by k order by date) as rn
from mytable t
where date >= '2020-04-01'
) n on n.k = p.k
where p.rn = 1 and n.rn = 1

This also generalizes the query so it can process multiple keys at once (key is language keyword, I used k instead).

SQL Interpolated Strings

Giving the credits to @j.f.sebastian for pointing out these solutions. Sadly xp_sprintf is limited to 254 characters, so that wouldn't be ideal when using long queries. FORMATMESSAGE instead is limited to 2047 characters, so that's good enough in order to run long queries.

I'll summarize everything in one post for the solutions in order to keep things organized.

Answer 1:

Using FORMATMESSAGE it's important to know, that using interpolated string as first the parameter, Its supported only SQL versions 2012 and above, so I'll post 2 answers with FORMATMESSAGE:

SQL Version >= 2012:

SET @query = FORMATMESSAGE('SELECT %s FROM SOME_TABLE', @somevariable);


SQL Version < 2012:

EXEC sp_addmessage 50001, 16, 'SELECT %s FROM SOME_TABLE', NULL, NULL, 'replace'
SET @query = FORMATMESSAGE(50001, @somevariable)


Answer 2:

Using xp_sprintf stored procedure is important to note that It's limited to 254 characters, so it won't be a good idea for long queries.

DECLARE  @query AS VARCHAR(100)
,@somevariable as VARCHAR(10) = '[id]'
EXEC xp_sprintf @query OUTPUT, 'SELECT %s FROM SOME_TABLE', @somevariable

How to interpolate Time series data using Linear interpolation on big datasets in Presto?

Basically, you can use lag(ignore nulls)/lead(ignore nulls) and some arithmetic for interpolation:

select t.*,
coalesce(t.pressure,
(time_ms - prev_time_ms) * (next_pressure - prev_pressure) / (next_time_ms - prev_time_ms)
) as imputed_pressure
from (select t.*,
to_milliseconds(time) as time_ms
lag(pressure ignore nulls) over (order by time) as prev_pressure,
lag(to_milliseconds(time) ignore nulls) over (order by time) as prev_time_ms,
lag(pressure ignore nulls) over (order by time) as next_pressure,
lag(to_milliseconds(time) ignore nulls) over (order by time) as next_time_ms
from t
) t

SQL Server Interpolate Missing rows

declare @MaxDate date
declare @MinDate date

select @MaxDate = MAX([Date]),
@MinDate = MIN([Date])
from Dates

declare @MaxValue int
declare @MinValue int

select @MaxValue = [Value] from Dates where [Date] = @MaxDate
select @MinValue = [Value] from Dates where [Date] = @MinDate

declare @diff int
select @diff = DATEDIFF(d, @MinDate, @MaxDate)

declare @increment int
set @increment = (@MaxValue - @MinValue) / @diff

select @increment

declare @jaggedDates as table
(
PID INT IDENTITY(1,1) PRIMARY KEY,
ThisDate date,
ThisValue int
)

declare @finalDates as table
(
PID INT IDENTITY(1,1) PRIMARY KEY,
[Date] date,
Value int
)

declare @thisDate date
declare @thisValue int
declare @nextDate date
declare @nextValue int

declare @count int
insert @jaggedDates select [Date], [Value] from Dates
select @count = @@ROWCOUNT

declare @thisId int
set @thisId = 1
declare @entryDiff int
declare @missingDate date
declare @missingValue int

while @thisId <= @count
begin
select @thisDate = ThisDate,
@thisValue = ThisValue
from @jaggedDates
where PID = @thisId

insert @finalDates values (@thisDate, @thisValue)

if @thisId < @count
begin
select @nextDate = ThisDate,
@nextValue = ThisValue
from @jaggedDates
where PID = @thisId + 1

select @entryDiff = DATEDIFF(d, @thisDate, @nextDate)
if @entryDiff > 1
begin
set @missingDate = @thisDate
set @missingValue = @thisValue
while @entryDiff > 1
begin
set @missingDate = DATEADD(d, 1, @missingDate)
set @missingValue = @missingValue + @increment
insert @finalDates values (@missingDate, @missingValue)
set @entryDiff = @entryDiff - 1
end
end
end

set @thisId = @thisId + 1
end

select * from @finalDates

Interpolate Multiseries Data In SQL

With the help of a query that gets you all the combinations of data_type_id and the 5-second moments you need, you can achieve the result you need using a subquery that gets you the closest data_value:

with recursive u as
(select '2022-01-19 17:20:42' as d
union all
select DATE_ADD(d, interval 5 second) from u
where d < '2022-01-19 17:20:52'),
v as
(select * from u cross join (select distinct data_type_id from table_name) t)
select v.data_type_id,
(select data_value from table_name where inserted_at <= d and data_type_id = v.data_type_id
order by inserted_at desc limit 1) as data_value,
d as inserted_at
from v

Fiddle

You can replace the recursive CTE with any query that gets you all the 5-second moments you need.



Related Topics



Leave a reply



Submit