SQL to Find Time Elapsed from Multiple Overlapping Intervals

SQL to find time elapsed from multiple overlapping intervals

This SQL statement seems to get what you want (t is the table name of the sampe table):

SELECT
d.id,
d.duration,
d.duration -
IFNULL(
( SELECT Sum( timestampdiff( SQL_TSI_DAY,
no_hold.d2,
( SELECT min(d1) FROM t t4
WHERE t4.id = no_hold.id and t4.d1 > no_hold.d2 )))
FROM ( SELECT DISTINCT id, d2 FROM t t1
WHERE ( SELECT sum( IIF( t1.d2 between t2.d1 and t2.d2, 1, 0 ) )
FROM t t2 WHERE t2.id = t1.id and t2.d2 <> t1.d2 ) = 0
And d2 <> ( select max( d2 ) from t t3 where t3.id = t1.id )) no_hold
WHERE no_hold.id = d.id ),
0 ) "parts hold"
FROM
( SELECT id, timestampdiff( SQL_TSI_DAY, min( d1 ), max( d2 ) ) duration
FROM t GROUP BY id ) d

The outer query gets the duration of the repair work. The complex subquery calculates the total number of days not waiting for parts. This is done by locating the start dates where the vehicle is not waiting for parts, and then count the number of days until it begins to wait for parts again:

// 1) The query for finding the starting dates when the vehicle is not waiting for parts, 
// i.e. finding all d2 that is not within any date range where the vehicle is waiting for part.
// The DISTINCT is needed to removed duplicate starting "no hold" period.

SELECT DISTINCT id, d2
FROM t t1
WHERE ( SELECT sum( IIF( t1.d2 between t2.d1 and t2.d2, 1, 0 ) ) from t t2
WHERE t2.id = t1.id and t2.d2 <> t1.d2 ) = 0 AND
d2 <> ( SELECT max( d2 ) FROM t t3 WHERE t3.id = t1.id ) )

// 2) The days where it vehicle is not waiting for part is the date from the above query till the vehicle is // waiting for part again

timestampdiff( SQL_TSI_DAY, no_hold.d2, ( SELECT min(d1) FROM t t4 WHERE t4.id = no_hold.id and t4.d1 > no_hold.d2 ) )

Combining the two above and aggregating all such periods gives the number of days that the vehicle is not waiting for parts. The final query adds an extra condition to calculate result for each id from the outer query.

This probably is not terribly efficient on very large table with many ids. It should fine if the id is limited to one or just a few.

SQL counting days with gap / overlapping

This is mostly a duplicate of my answer here (including explanation) but with the inclusion of grouping on an id column. It should use a single table scan and does not require a recursive sub-query factoring clause (CTE) or self joins.

SQL Fiddle

Oracle 11g R2 Schema Setup:

CREATE TABLE your_table ( id, usr, start_date, end_date ) AS
SELECT 1, 'A', DATE '2017-06-01', DATE '2017-06-03' FROM DUAL UNION ALL
SELECT 1, 'B', DATE '2017-06-02', DATE '2017-06-04' FROM DUAL UNION ALL -- Overlaps previous
SELECT 1, 'C', DATE '2017-06-06', DATE '2017-06-06' FROM DUAL UNION ALL
SELECT 1, 'D', DATE '2017-06-07', DATE '2017-06-07' FROM DUAL UNION ALL -- Adjacent to previous
SELECT 1, 'E', DATE '2017-06-11', DATE '2017-06-20' FROM DUAL UNION ALL
SELECT 1, 'F', DATE '2017-06-14', DATE '2017-06-15' FROM DUAL UNION ALL -- Within previous
SELECT 1, 'G', DATE '2017-06-22', DATE '2017-06-25' FROM DUAL UNION ALL
SELECT 1, 'H', DATE '2017-06-24', DATE '2017-06-28' FROM DUAL UNION ALL -- Overlaps previous and next
SELECT 1, 'I', DATE '2017-06-27', DATE '2017-06-30' FROM DUAL UNION ALL
SELECT 1, 'J', DATE '2017-06-27', DATE '2017-06-28' FROM DUAL UNION ALL -- Within H and I
SELECT 2, 'K', DATE '2011-08-01', DATE '2011-08-08' FROM DUAL UNION ALL -- Your data below
SELECT 2, 'L', DATE '2011-08-02', DATE '2011-08-06' FROM DUAL UNION ALL
SELECT 2, 'M', DATE '2011-08-03', DATE '2011-08-10' FROM DUAL UNION ALL
SELECT 2, 'N', DATE '2011-08-12', DATE '2011-08-14' FROM DUAL UNION ALL
SELECT 3, 'O', DATE '2011-08-01', DATE '2011-08-03' FROM DUAL UNION ALL
SELECT 3, 'P', DATE '2011-08-02', DATE '2011-08-06' FROM DUAL UNION ALL
SELECT 3, 'Q', DATE '2011-08-05', DATE '2011-08-09' FROM DUAL;

Query 1:

SELECT id,
SUM( days ) AS total_days
FROM (
SELECT id,
dt - LAG( dt ) OVER ( PARTITION BY id
ORDER BY dt ) + 1 AS days,
start_end
FROM (
SELECT id,
dt,
CASE SUM( value ) OVER ( PARTITION BY id
ORDER BY dt ASC, value DESC, ROWNUM ) * value
WHEN 1 THEN 'start'
WHEN 0 THEN 'end'
END AS start_end
FROM your_table
UNPIVOT ( dt FOR value IN ( start_date AS 1, end_date AS -1 ) )
)
WHERE start_end IS NOT NULL
)
WHERE start_end = 'end'
GROUP BY id

Results:

| ID | TOTAL_DAYS |
|----|------------|
| 1 | 25 |
| 2 | 13 |
| 3 | 9 |

Calculate Actual Downtime ignoring overlap in dates/times

UPDATED WITH NEW TEST CASES

Here is one technique that calculates the unique outages and then aligns them back to the initial downtime causing the outage so that the actual and expected values match.

DECLARE @Downtime TABLE (
ID INT PRIMARY KEY NOT NULL IDENTITY(1,1),
Application VARCHAR(25),
DowntimeStart DATETIME,
DowntimeEnd DATETIME,
Expected INT
)

INSERT @Downtime (Application, DowntimeStart, DowntimeEnd, Expected) VALUES -- Act/Exp
('Application Demo', '2014-11-20 17:31:01.467', '2014-11-20 18:01:01.243', 30) -- 30/30
,('Application Demo', '2014-11-28 17:59:00.987', '2014-11-28 18:09:02.167', 26) -- 10/26
,('Application Demo', '2014-11-28 18:00:01.403', '2014-11-28 18:25:01.443', 0) -- 25/0
,('Application Demo', '2014-11-29 19:13:08.580', '2014-11-30 05:30:01.763', 617) -- 617/617
,('Application Demo', '2014-11-30 01:55:01.953', '2014-11-30 03:54:01.730', 0)
,('Application Demo 2', '2014-12-19 23:09:01.303', '2014-12-22 09:43:01.397', 3514)
,('Application Demo 2', '2014-12-19 23:09:01.303', '2014-12-22 09:43:01.397', 0)
,('Application Demo 2', '2014-12-19 23:09:01.303', '2014-12-22 09:43:01.397', 0)
,('Application Demo 2', '2014-12-19 23:09:01.303', '2014-12-22 09:43:01.397', 0)
,('Application Demo 2', '2014-12-19 23:09:01.303', '2014-12-22 09:43:01.397', 0)

SELECT
Downtimes.Application,
Downtimes.DowntimeStart,
Downtimes.DowntimeEnd,
Downtimes.Expected,
COALESCE(Actual, 0) AS Actual
FROM @Downtime Downtimes
LEFT OUTER JOIN (
SELECT DISTINCT
D1.Application,
MIN(CASE WHEN D1.DowntimeStart < D2.DowntimeStart THEN D1.ID ELSE D2.ID END) AS [ID],
MIN(CASE WHEN D1.DowntimeStart < D2.DowntimeStart THEN D1.DowntimeStart ELSE D2.DowntimeStart END) AS [DowntimeStart],
MAX(CASE WHEN D1.DowntimeEnd > D2.DowntimeEnd THEN D1.DowntimeEnd ELSE D2.DowntimeEnd END) AS [DowntimeEnd],
DATEDIFF(MINUTE,
MIN(CASE WHEN D1.DowntimeStart < D2.DowntimeStart THEN D1.DowntimeStart ELSE D2.DowntimeStart END),
MAX(CASE WHEN D1.DowntimeEnd > D2.DowntimeEnd THEN D1.DowntimeEnd ELSE D2.DowntimeEnd END)) AS Actual
FROM @Downtime D1
INNER JOIN @Downtime D2
ON D1.Application = D2.Application
AND (D1.DowntimeStart BETWEEN D2.DowntimeStart AND D2.DowntimeEnd
OR D2.DowntimeStart BETWEEN D1.DowntimeStart AND D1.DowntimeEnd)
GROUP BY
D1.Application,
D1.DowntimeStart
) Outages
ON Outages.ID = Downtimes.ID

And this yields the desired output:

Application               DowntimeStart           DowntimeEnd             Expected    Actual
------------------------- ----------------------- ----------------------- ----------- -----------
Application Demo 2014-11-20 17:31:01.467 2014-11-20 18:01:01.243 30 30
Application Demo 2014-11-28 17:59:00.987 2014-11-28 18:09:02.167 26 26
Application Demo 2014-11-28 18:00:01.403 2014-11-28 18:25:01.443 0 0
Application Demo 2014-11-29 19:13:08.580 2014-11-30 05:30:01.763 617 617
Application Demo 2014-11-30 01:55:01.953 2014-11-30 03:54:01.730 0 0
Application Demo 2 2014-12-19 23:09:01.303 2014-12-22 09:43:01.397 3514 3514
Application Demo 2 2014-12-19 23:09:01.303 2014-12-22 09:43:01.397 0 0
Application Demo 2 2014-12-19 23:09:01.303 2014-12-22 09:43:01.397 0 0
Application Demo 2 2014-12-19 23:09:01.303 2014-12-22 09:43:01.397 0 0
Application Demo 2 2014-12-19 23:09:01.303 2014-12-22 09:43:01.397 0 0

Find total time worked with multiple jobs / orders with overlap / overlapping times on each worker and job / order

This query does the job as well. Its performance is very good (while the execution plan looks not so great, the actual CPU and IO beat many other queries).

See it working in a Sql Fiddle.

WITH Times AS (
SELECT DISTINCT
H.WorkerID,
T.Boundary
FROM
dbo.JobHistory H
CROSS APPLY (VALUES (H.JobStart), (H.JobEnd)) T (Boundary)
), Groups AS (
SELECT
WorkerID,
T.Boundary,
Grp = Row_Number() OVER (PARTITION BY T.WorkerID ORDER BY T.Boundary) / 2
FROM
Times T
CROSS JOIN (VALUES (1), (1)) X (Dup)
), Boundaries AS (
SELECT
G.WorkerID,
TimeStart = Min(Boundary),
TimeEnd = Max(Boundary)
FROM
Groups G
GROUP BY
G.WorkerID,
G.Grp
HAVING
Count(*) = 2
)
SELECT
B.WorkerID,
WorkedMinutes = Sum(DateDiff(minute, 0, B.TimeEnd - B.TimeStart))
FROM
Boundaries B
WHERE
EXISTS (
SELECT *
FROM dbo.JobHistory H
WHERE
B.WorkerID = H.WorkerID
AND B.TimeStart < H.JobEnd
AND B.TimeEnd > H.JobStart
)
GROUP BY
WorkerID
;

With a clustered index on WorkerID, JobStart, JobEnd, JobID, and with the sample 7 rows from the above fiddle a template for new worker/job data repeated enough times to yield a table with 14,336 rows, here are the performance results. I've included the other working/correct answers on the page (so far):

Author  CPU  Elapsed  Reads   Scans
------ --- ------- ------ -----
Erik 157 166 122 2
Gordon 375 378 106964 53251

I did a more exhaustive test from a different (slower) server (where each query was run 25 times, the best and worst values for each metric were thrown out, and the remaining 23 values were averaged) and got the following:

Query     CPU   Duration  Reads   Notes
-------- ---- -------- ------ ----------------------------------
Erik 1 215 231 122 query as above
Erik 2 326 379 116 alternate technique with no EXISTS
Gordon 1 578 682 106847 from j
Gordon 2 584 673 106847 from dbo.JobHistory

The alternate technique I thought to be sure to improve things. Well, it saved 6 reads, but cost a lot more CPU (which makes sense). Instead of carrying through the start/end statistics of each timeslice to the end, it is best just recalculating which slices to keep with the EXISTS against the original data. It may be that a different profile of few workers with many jobs could change the performance statistics for different queries.

In case anyone wants to try it, use the CREATE TABLE and INSERT statements from my fiddle and then run this 11 times:

INSERT dbo.JobHistory
SELECT
H.JobID + A.MaxJobID,
H.WorkerID + A.WorkerCount,
DateAdd(minute, Elapsed + 45, JobStart),
DateAdd(minute, Elapsed + 45, JobEnd)
FROM
dbo.JobHistory H
CROSS JOIN (
SELECT
MaxJobID = Max(JobID),
WorkerCount = Max(WorkerID) - Min(WorkerID) + 1,
Elapsed = DateDiff(minute, Min(JobStart), Min(JobEnd))
FROM dbo.JobHistory
) A
;

I built two other solutions to this query but the best one with about double the performance had a fatal flaw (not correctly handling fully enclosed time ranges). The other had very high/bad statistics (which I knew but had to try).

Explanation

Using all the endpoint times from each row, build up a distinct list of all possible time ranges of interest by duplicating each endpoint time and then grouping in such a way as to pair each time with the next possible time. Sum the elapsed minutes of these ranges wherever they coincide with any actual worker's working time.

Calculate time period overlap and use aggregation to find out remaining time

You could try splitting your conditions into CASEs:

SELECT sum(sec) FROM (
SELECT
CASE
WHEN type = 1 THEN
EXTRACT(SECOND FROM end_time-start_time)
WHEN type = 2 AND NOT EXISTS (SELECT 1 FROM timings
WHERE type = 1 AND
(start_time::date = t.start_time::date OR
end_time::date = t.end_time::date)) THEN 0
WHEN type = 2 AND EXISTS (SELECT 1 FROM timings
WHERE start_time < t.start_time AND
end_time > t.end_time AND
type = 1) THEN
EXTRACT(SECOND FROM end_time-start_time) * -1
END AS sec
FROM timings t) j;

Demo: db<>fiddle

Extract time points meeting some time interval criteria

Here is one way to solve this, which will work at least as far back as Oracle 10.2. It uses analytic functions and a hierarchical query.

The WITH clause is there just to build the sample data on the fly. You don't need it - remove it, and use your actual table and column names in the query. (In the WITH clause I declared the columns after the CTE name, which works only in Oracle 11.2 and higher, but the WITH clause is not part of the solution, so I wouldn't worry about that.)

with
sample_data (country, date_id) as (
select 'USA', 199003 from dual union all
select 'USA', 200004 from dual union all
select 'USA', 200005 from dual union all
select 'USA', 200009 from dual union all
select 'USA', 200010 from dual union all
select 'UK' , 199307 from dual union all
select 'UK' , 199308 from dual union all
select 'UK' , 199408 from dual
)
select country, date_id
from (
select country, date_id,
row_number() over (partition by country order by dt) as rn,
count(*) over (partition by country order by dt
range between current row
and interval '4' month following) as ct
from (
select country, date_id,
to_date(to_char(date_id, 'fm999999'), 'yyyymm') as dt
from sample_data
)
)
start with rn = 1
connect by country = prior country and rn = prior rn + prior ct
;

COUNTRY DATE_ID
------- ----------
UK 199307
UK 199408
USA 199003
USA 200004
USA 200009

For comparison, here is a match_recognize solution, which requires Oracle 12.1 or higher:

select country, date_id
from (
select country, date_id,
to_date(to_char(date_id, 'fm999999'), 'yyyymm') dt
from sample_data
)
match_recognize(
partition by country
order by date_id
all rows per match
pattern (a {- b* -})
define b as dt < add_months(a.dt, 5)
);


Related Topics



Leave a reply



Submit