Best way to interpolate values in SQL
Something like this (corrected):
SELECT CASE WHEN next.Date IS NULL THEN prev.Rate
WHEN prev.Date IS NULL THEN next.Rate
WHEN next.Date = prev.Date THEN prev.Rate
ELSE ( DATEDIFF(d, prev.Date, @InputDate) * next.Rate
+ DATEDIFF(d, @InputDate, next.Date) * prev.Rate
) / DATEDIFF(d, prev.Date, next.Date)
END AS interpolationRate
FROM
( SELECT TOP 1
Date, Rate
FROM Rates
WHERE Date <= @InputDate
ORDER BY Date DESC
) AS prev
CROSS JOIN
( SELECT TOP 1
Date, Rate
FROM Rates
WHERE Date >= @InputDate
ORDER BY Date ASC
) AS next
SQL query to interpolate between values
This can probably be simplified a bit but gets the answer you wanted, I believe. The slightly tricky bit is getting both the number of days between not-null values (i.e. the size of the gap you're filling) and then the position within that gap:
-- CTE for sample data
with your_table (emp, test_date, value) as (
select 'A', date '2001-01-01', null from dual
union all select 'A', date '2001-01-02', 100 from dual
union all select 'A', date '2001-01-03', null from dual
union all select 'A', date '2001-01-04', 80 from dual
union all select 'A', date '2001-01-05', null from dual
union all select 'A', date '2001-01-06', null from dual
union all select 'A', date '2001-01-07', 75 from dual
)
-- actual query
select emp, test_date, value,
coalesce(value,
(next_value - prev_value) -- v3-v1
/ (count(*) over (partition by grp) + 1) -- d3-d1
* row_number() over (partition by grp order by test_date desc) -- d2-d1, indirectly
+ prev_value -- v1
) as interpolated
from (
select emp, test_date, value,
last_value(value ignore nulls)
over (partition by emp order by test_date) as prev_value,
first_value(value ignore nulls)
over (partition by emp order by test_date range between current row and unbounded following) as next_value,
row_number() over (partition by emp order by test_date) -
row_number() over (partition by emp order by case when value is null then 1 else 0 end, test_date) as grp
from your_table
)
order by test_date;
E TEST_DATE VALUE INTERPOLATED
- ---------- ---------- ------------
A 2001-01-01
A 2001-01-02 100 100
A 2001-01-03 90
A 2001-01-04 80 80
A 2001-01-05 76.6666667
A 2001-01-06 78.3333333
A 2001-01-07 75 75
I've used last_value
and first_value
instead of lead
and lag
, but either works. (Lead/lag might be faster on a large data set I suppose). The grp
calculation is Tabibitosan.
How to get interpolation value in SQL Server?
If I assume that you mean linear interpolation between the previous price and the next price based on the number of days that passed, then you can use the following method:
- Use window functions to get the next and previous days with prices for each row.
- Use window functions or joins to get the prices on those days as well.
- Use arithmetic to calculate the linear interpolation.
You SQL Fiddle uses SQL Server, so I assume that is the database you are using. The code looks like this:
select t.*,
coalesce(t.price,
(tprev.price +
(tnext.price - tprev.price) / datediff(day, prev_price_day, next_price_day) *
datediff(day, t.prev_price_day, t.dt_day)
)
) as imputed_price
from (select t.*,
max(case when price is not null then dt_day end) over (partition by cat01, cat02 order by dt_day asc) as prev_price_day,
min(case when price is not null then dt_day end) over (partition by cat01, cat02 order by dt_day desc) as next_price_day
from temp01 t
) t left join
temp01 tprev
on tprev.cat01 = t.cat01 and
tprev.cat02 = t.cat02 and
tprev.dt_day = t.prev_price_day left join
temp01 tnext
on tnext.cat01 = t.cat01 and
tnext.cat02 = t.cat02 and
tnext.dt_day = t.next_price_day
order by cat01, cat02, dt_day;
Here is a db<>fiddle.
SQL query with linear interpolation and Group By
This might be a good place for lateral joins:
select d.dt,
case
when n.date = p.date then p.value
else p.value + (n.value - p.value) / datediff('day', n.date, p.date)
end as new_value
from (select date '2020-04-01') d(date)
cross join lateral (
select t.* from mytable t where t.date <= d.date order by t.date desc limit 1
) p -- "previous" value
cross join lateral (
select t.* from mytable t where t.date >= d.date order by t.date limit 1
) n -- "next" value
We can write the query without lateral joins:
select date '2020-04-01' as dt, p.k,
case
when n.date = p.date then p.value
else p.value + (n.value - p.value) / datediff('day', n.date, p.date)
end as new_value
from (
select t.*,
row_number() over(partition by k order by date desc) as rn
from mytable t
where date <= '2020-04-01'
) p
inner join (
select t.*,
row_number() over(partition by k order by date) as rn
from mytable t
where date >= '2020-04-01'
) n on n.k = p.k
where p.rn = 1 and n.rn = 1
This also generalizes the query so it can process multiple keys at once (key
is language keyword, I used k
instead).
SQL Interpolated Strings
Giving the credits to @j.f.sebastian for pointing out these solutions. Sadly xp_sprintf is limited to 254 characters, so that wouldn't be ideal when using long queries. FORMATMESSAGE instead is limited to 2047 characters, so that's good enough in order to run long queries.
I'll summarize everything in one post for the solutions in order to keep things organized.
Answer 1:
Using FORMATMESSAGE it's important to know, that using interpolated string as first the parameter, Its supported only SQL versions 2012 and above, so I'll post 2 answers with FORMATMESSAGE:
SQL Version >= 2012:
SET @query = FORMATMESSAGE('SELECT %s FROM SOME_TABLE', @somevariable);
SQL Version < 2012:
EXEC sp_addmessage 50001, 16, 'SELECT %s FROM SOME_TABLE', NULL, NULL, 'replace'
SET @query = FORMATMESSAGE(50001, @somevariable)
Answer 2:
Using xp_sprintf stored procedure is important to note that It's limited to 254 characters, so it won't be a good idea for long queries.
DECLARE @query AS VARCHAR(100)
,@somevariable as VARCHAR(10) = '[id]'
EXEC xp_sprintf @query OUTPUT, 'SELECT %s FROM SOME_TABLE', @somevariable
How to interpolate Time series data using Linear interpolation on big datasets in Presto?
Basically, you can use lag(ignore nulls)
/lead(ignore nulls)
and some arithmetic for interpolation:
select t.*,
coalesce(t.pressure,
(time_ms - prev_time_ms) * (next_pressure - prev_pressure) / (next_time_ms - prev_time_ms)
) as imputed_pressure
from (select t.*,
to_milliseconds(time) as time_ms
lag(pressure ignore nulls) over (order by time) as prev_pressure,
lag(to_milliseconds(time) ignore nulls) over (order by time) as prev_time_ms,
lag(pressure ignore nulls) over (order by time) as next_pressure,
lag(to_milliseconds(time) ignore nulls) over (order by time) as next_time_ms
from t
) t
SQL Server Interpolate Missing rows
declare @MaxDate date
declare @MinDate date
select @MaxDate = MAX([Date]),
@MinDate = MIN([Date])
from Dates
declare @MaxValue int
declare @MinValue int
select @MaxValue = [Value] from Dates where [Date] = @MaxDate
select @MinValue = [Value] from Dates where [Date] = @MinDate
declare @diff int
select @diff = DATEDIFF(d, @MinDate, @MaxDate)
declare @increment int
set @increment = (@MaxValue - @MinValue) / @diff
select @increment
declare @jaggedDates as table
(
PID INT IDENTITY(1,1) PRIMARY KEY,
ThisDate date,
ThisValue int
)
declare @finalDates as table
(
PID INT IDENTITY(1,1) PRIMARY KEY,
[Date] date,
Value int
)
declare @thisDate date
declare @thisValue int
declare @nextDate date
declare @nextValue int
declare @count int
insert @jaggedDates select [Date], [Value] from Dates
select @count = @@ROWCOUNT
declare @thisId int
set @thisId = 1
declare @entryDiff int
declare @missingDate date
declare @missingValue int
while @thisId <= @count
begin
select @thisDate = ThisDate,
@thisValue = ThisValue
from @jaggedDates
where PID = @thisId
insert @finalDates values (@thisDate, @thisValue)
if @thisId < @count
begin
select @nextDate = ThisDate,
@nextValue = ThisValue
from @jaggedDates
where PID = @thisId + 1
select @entryDiff = DATEDIFF(d, @thisDate, @nextDate)
if @entryDiff > 1
begin
set @missingDate = @thisDate
set @missingValue = @thisValue
while @entryDiff > 1
begin
set @missingDate = DATEADD(d, 1, @missingDate)
set @missingValue = @missingValue + @increment
insert @finalDates values (@missingDate, @missingValue)
set @entryDiff = @entryDiff - 1
end
end
end
set @thisId = @thisId + 1
end
select * from @finalDates
Interpolate Multiseries Data In SQL
With the help of a query that gets you all the combinations of data_type_id
and the 5-second moments you need, you can achieve the result you need using a subquery that gets you the closest data_value
:
with recursive u as
(select '2022-01-19 17:20:42' as d
union all
select DATE_ADD(d, interval 5 second) from u
where d < '2022-01-19 17:20:52'),
v as
(select * from u cross join (select distinct data_type_id from table_name) t)
select v.data_type_id,
(select data_value from table_name where inserted_at <= d and data_type_id = v.data_type_id
order by inserted_at desc limit 1) as data_value,
d as inserted_at
from v
Fiddle
You can replace the recursive CTE with any query that gets you all the 5-second moments you need.
Related Topics
How to Find the Size of a Table in SQL
Getting the Number of Rows with a Group by Query
How to Use the Select into Clause with Union [All]
How to Use a Postgresql Triggers to Store Changes (SQL Statements and Row Changes)
Oracle - What Statements Need to Be Committed
SQL Server Invalid Column Name After Adding New Column
Efficiently Storing 7.300.000.000 Rows
New Line Issue When Copying Data from SQL Server 2012 to Excel
Postgresql Generate_Series of Months
Deleting Hierarchical Data in SQL Table
SQL Bulk Insert with Firstrow Parameter Skips the Following Line
Postgresql 9.1: How to Concatenate Rows in Array Without Duplicates, Join Another Table