SQL to Find Time Elapsed from Multiple Overlapping Intervals

SQL to find time elapsed from multiple overlapping intervals

This SQL statement seems to get what you want (t is the table name of the sampe table):

SELECT
   d.id, 
   d.duration, 
   d.duration - 
   IFNULL(
      ( SELECT Sum( timestampdiff( SQL_TSI_DAY, 
                                   no_hold.d2, 
                                   ( SELECT min(d1) FROM t t4 
                                     WHERE t4.id = no_hold.id and t4.d1 > no_hold.d2 )))
        FROM ( SELECT DISTINCT id, d2 FROM t t1 
               WHERE ( SELECT sum( IIF( t1.d2 between t2.d1 and t2.d2, 1, 0 ) ) 
                       FROM t t2 WHERE t2.id = t1.id and t2.d2 <> t1.d2 ) = 0 
             And d2 <> ( select max( d2 ) from t t3 where t3.id = t1.id )) no_hold
        WHERE no_hold.id = d.id ),
      0 ) "parts hold"
FROM 
   ( SELECT id, timestampdiff( SQL_TSI_DAY, min( d1 ), max( d2 ) ) duration
     FROM t GROUP BY id ) d

The outer query gets the duration of the repair work. The complex subquery calculates the total number of days not waiting for parts. This is done by locating the start dates where the vehicle is not waiting for parts, and then count the number of days until it begins to wait for parts again:

// 1) The query for finding the starting dates when the vehicle is not waiting for parts, 
// i.e. finding all d2 that is not within any date range where the vehicle is waiting for part.
// The DISTINCT is needed to removed duplicate starting "no hold" period.

SELECT DISTINCT id, d2 
FROM t t1
WHERE ( SELECT sum( IIF( t1.d2 between t2.d1 and t2.d2, 1, 0 ) ) from t t2 
        WHERE t2.id = t1.id and t2.d2 <> t1.d2 ) = 0 AND 
      d2 <> ( SELECT max( d2 ) FROM t t3 WHERE t3.id = t1.id ) )

// 2) The days where it vehicle is not waiting for part is the date from the above query till the vehicle is // waiting for part again

timestampdiff( SQL_TSI_DAY, no_hold.d2, ( SELECT min(d1) FROM t t4 WHERE t4.id = no_hold.id and t4.d1 > no_hold.d2 ) )

Combining the two above and aggregating all such periods gives the number of days that the vehicle is not waiting for parts. The final query adds an extra condition to calculate result for each id from the outer query.

This probably is not terribly efficient on very large table with many ids. It should fine if the id is limited to one or just a few.

SQL counting days with gap / overlapping

This is mostly a duplicate of my answer here (including explanation) but with the inclusion of grouping on an id column. It should use a single table scan and does not require a recursive sub-query factoring clause (CTE) or self joins.

SQL Fiddle

Oracle 11g R2 Schema Setup:

CREATE TABLE your_table ( id, usr, start_date, end_date ) AS
  SELECT 1, 'A', DATE '2017-06-01', DATE '2017-06-03' FROM DUAL UNION ALL
  SELECT 1, 'B', DATE '2017-06-02', DATE '2017-06-04' FROM DUAL UNION ALL -- Overlaps previous
  SELECT 1, 'C', DATE '2017-06-06', DATE '2017-06-06' FROM DUAL UNION ALL
  SELECT 1, 'D', DATE '2017-06-07', DATE '2017-06-07' FROM DUAL UNION ALL -- Adjacent to previous
  SELECT 1, 'E', DATE '2017-06-11', DATE '2017-06-20' FROM DUAL UNION ALL
  SELECT 1, 'F', DATE '2017-06-14', DATE '2017-06-15' FROM DUAL UNION ALL -- Within previous
  SELECT 1, 'G', DATE '2017-06-22', DATE '2017-06-25' FROM DUAL UNION ALL
  SELECT 1, 'H', DATE '2017-06-24', DATE '2017-06-28' FROM DUAL UNION ALL -- Overlaps previous and next
  SELECT 1, 'I', DATE '2017-06-27', DATE '2017-06-30' FROM DUAL UNION ALL
  SELECT 1, 'J', DATE '2017-06-27', DATE '2017-06-28' FROM DUAL UNION ALL -- Within H and I
  SELECT 2, 'K', DATE '2011-08-01', DATE '2011-08-08' FROM DUAL UNION ALL -- Your data below
  SELECT 2, 'L', DATE '2011-08-02', DATE '2011-08-06' FROM DUAL UNION ALL
  SELECT 2, 'M', DATE '2011-08-03', DATE '2011-08-10' FROM DUAL UNION ALL
  SELECT 2, 'N', DATE '2011-08-12', DATE '2011-08-14' FROM DUAL UNION ALL
  SELECT 3, 'O', DATE '2011-08-01', DATE '2011-08-03' FROM DUAL UNION ALL
  SELECT 3, 'P', DATE '2011-08-02', DATE '2011-08-06' FROM DUAL UNION ALL
  SELECT 3, 'Q', DATE '2011-08-05', DATE '2011-08-09' FROM DUAL;

Query 1:

SELECT id,
       SUM( days ) AS total_days
FROM   (
  SELECT id,
         dt - LAG( dt ) OVER ( PARTITION BY id
                               ORDER BY dt ) + 1 AS days,
         start_end
  FROM   (
    SELECT id,
           dt,
           CASE SUM( value ) OVER ( PARTITION BY id
                                    ORDER BY dt ASC, value DESC, ROWNUM ) * value
             WHEN 1 THEN 'start'
             WHEN 0 THEN 'end'
           END AS start_end
    FROM   your_table
    UNPIVOT ( dt FOR value IN ( start_date AS 1, end_date AS -1 ) )
  )
  WHERE start_end IS NOT NULL
)
WHERE start_end = 'end'
GROUP BY id

Results:

| ID | TOTAL_DAYS |
|----|------------|
|  1 |         25 |
|  2 |         13 |
|  3 |          9 |

Calculate Actual Downtime ignoring overlap in dates/times

UPDATED WITH NEW TEST CASES

Here is one technique that calculates the unique outages and then aligns them back to the initial downtime causing the outage so that the actual and expected values match.

DECLARE @Downtime TABLE (
    ID INT PRIMARY KEY NOT NULL IDENTITY(1,1),
    Application VARCHAR(25), 
    DowntimeStart DATETIME,
    DowntimeEnd DATETIME,
    Expected INT
)

INSERT @Downtime (Application, DowntimeStart, DowntimeEnd, Expected) VALUES -- Act/Exp
    ('Application Demo', '2014-11-20 17:31:01.467', '2014-11-20 18:01:01.243', 30) -- 30/30
    ,('Application Demo', '2014-11-28 17:59:00.987', '2014-11-28 18:09:02.167', 26) -- 10/26
    ,('Application Demo', '2014-11-28 18:00:01.403', '2014-11-28 18:25:01.443', 0) -- 25/0
    ,('Application Demo', '2014-11-29 19:13:08.580', '2014-11-30 05:30:01.763', 617) -- 617/617
    ,('Application Demo', '2014-11-30 01:55:01.953', '2014-11-30 03:54:01.730', 0)
    ,('Application Demo 2', '2014-12-19 23:09:01.303', '2014-12-22 09:43:01.397', 3514)
    ,('Application Demo 2', '2014-12-19 23:09:01.303', '2014-12-22 09:43:01.397', 0)
    ,('Application Demo 2', '2014-12-19 23:09:01.303', '2014-12-22 09:43:01.397', 0) 
    ,('Application Demo 2', '2014-12-19 23:09:01.303', '2014-12-22 09:43:01.397', 0)
    ,('Application Demo 2', '2014-12-19 23:09:01.303', '2014-12-22 09:43:01.397', 0) 

SELECT
    Downtimes.Application,
    Downtimes.DowntimeStart,
    Downtimes.DowntimeEnd,
    Downtimes.Expected,
    COALESCE(Actual, 0) AS Actual
FROM @Downtime Downtimes
    LEFT OUTER JOIN (
        SELECT DISTINCT
            D1.Application,
            MIN(CASE WHEN D1.DowntimeStart < D2.DowntimeStart THEN D1.ID ELSE D2.ID END) AS [ID],
            MIN(CASE WHEN D1.DowntimeStart < D2.DowntimeStart THEN D1.DowntimeStart ELSE D2.DowntimeStart END) AS [DowntimeStart],
            MAX(CASE WHEN D1.DowntimeEnd > D2.DowntimeEnd THEN D1.DowntimeEnd ELSE D2.DowntimeEnd END) AS [DowntimeEnd],
            DATEDIFF(MINUTE,
                MIN(CASE WHEN D1.DowntimeStart < D2.DowntimeStart THEN D1.DowntimeStart ELSE D2.DowntimeStart END),
                MAX(CASE WHEN D1.DowntimeEnd > D2.DowntimeEnd THEN D1.DowntimeEnd ELSE D2.DowntimeEnd END)) AS Actual
        FROM @Downtime D1
            INNER JOIN @Downtime D2
                ON D1.Application = D2.Application
                    AND (D1.DowntimeStart BETWEEN D2.DowntimeStart AND D2.DowntimeEnd
                        OR D2.DowntimeStart BETWEEN D1.DowntimeStart AND D1.DowntimeEnd)
        GROUP BY
            D1.Application,
            D1.DowntimeStart
    ) Outages
        ON Outages.ID = Downtimes.ID

And this yields the desired output:

Application               DowntimeStart           DowntimeEnd             Expected    Actual
------------------------- ----------------------- ----------------------- ----------- -----------
Application Demo          2014-11-20 17:31:01.467 2014-11-20 18:01:01.243 30          30
Application Demo          2014-11-28 17:59:00.987 2014-11-28 18:09:02.167 26          26
Application Demo          2014-11-28 18:00:01.403 2014-11-28 18:25:01.443 0           0
Application Demo          2014-11-29 19:13:08.580 2014-11-30 05:30:01.763 617         617
Application Demo          2014-11-30 01:55:01.953 2014-11-30 03:54:01.730 0           0
Application Demo 2        2014-12-19 23:09:01.303 2014-12-22 09:43:01.397 3514        3514
Application Demo 2        2014-12-19 23:09:01.303 2014-12-22 09:43:01.397 0           0
Application Demo 2        2014-12-19 23:09:01.303 2014-12-22 09:43:01.397 0           0
Application Demo 2        2014-12-19 23:09:01.303 2014-12-22 09:43:01.397 0           0
Application Demo 2        2014-12-19 23:09:01.303 2014-12-22 09:43:01.397 0           0

Find total time worked with multiple jobs / orders with overlap / overlapping times on each worker and job / order

This query does the job as well. Its performance is very good (while the execution plan looks not so great, the actual CPU and IO beat many other queries).

See it working in a Sql Fiddle.

WITH Times AS (
   SELECT DISTINCT
      H.WorkerID,
      T.Boundary
   FROM
      dbo.JobHistory H
      CROSS APPLY (VALUES (H.JobStart), (H.JobEnd)) T (Boundary)
), Groups AS (
   SELECT
      WorkerID,
      T.Boundary,
      Grp = Row_Number() OVER (PARTITION BY T.WorkerID ORDER BY T.Boundary) / 2
   FROM
      Times T
      CROSS JOIN (VALUES (1), (1)) X (Dup)
), Boundaries AS (
   SELECT
      G.WorkerID,
      TimeStart = Min(Boundary),
      TimeEnd = Max(Boundary)
   FROM
      Groups G
   GROUP BY
      G.WorkerID,
      G.Grp
   HAVING
      Count(*) = 2
)
SELECT
   B.WorkerID,
   WorkedMinutes = Sum(DateDiff(minute, 0, B.TimeEnd - B.TimeStart))
FROM
   Boundaries B
WHERE
   EXISTS (
      SELECT *
      FROM dbo.JobHistory H
      WHERE
         B.WorkerID = H.WorkerID
         AND B.TimeStart < H.JobEnd
         AND B.TimeEnd > H.JobStart
   )
GROUP BY
   WorkerID
;

With a clustered index on WorkerID, JobStart, JobEnd, JobID, and with the sample 7 rows from the above fiddle a template for new worker/job data repeated enough times to yield a table with 14,336 rows, here are the performance results. I've included the other working/correct answers on the page (so far):

Author  CPU  Elapsed  Reads   Scans
------  ---  -------  ------  -----
  Erik  157    166      122       2
Gordon  375    378    106964  53251

I did a more exhaustive test from a different (slower) server (where each query was run 25 times, the best and worst values for each metric were thrown out, and the remaining 23 values were averaged) and got the following:

Query     CPU   Duration  Reads   Notes
--------  ----  --------  ------  ----------------------------------
Erik 1    215   231       122     query as above
Erik 2    326   379       116     alternate technique with no EXISTS
Gordon 1  578   682       106847  from j
Gordon 2  584   673       106847  from dbo.JobHistory

The alternate technique I thought to be sure to improve things. Well, it saved 6 reads, but cost a lot more CPU (which makes sense). Instead of carrying through the start/end statistics of each timeslice to the end, it is best just recalculating which slices to keep with the EXISTS against the original data. It may be that a different profile of few workers with many jobs could change the performance statistics for different queries.

In case anyone wants to try it, use the CREATE TABLE and INSERT statements from my fiddle and then run this 11 times:

INSERT dbo.JobHistory
SELECT
   H.JobID + A.MaxJobID,
   H.WorkerID + A.WorkerCount,
   DateAdd(minute, Elapsed + 45, JobStart),
   DateAdd(minute, Elapsed + 45, JobEnd)
FROM
   dbo.JobHistory H
   CROSS JOIN (
      SELECT
         MaxJobID = Max(JobID),
         WorkerCount = Max(WorkerID) - Min(WorkerID) + 1,
         Elapsed = DateDiff(minute, Min(JobStart), Min(JobEnd))
      FROM dbo.JobHistory
   ) A
;

I built two other solutions to this query but the best one with about double the performance had a fatal flaw (not correctly handling fully enclosed time ranges). The other had very high/bad statistics (which I knew but had to try).

Explanation

Using all the endpoint times from each row, build up a distinct list of all possible time ranges of interest by duplicating each endpoint time and then grouping in such a way as to pair each time with the next possible time. Sum the elapsed minutes of these ranges wherever they coincide with any actual worker's working time.

Calculate time period overlap and use aggregation to find out remaining time

You could try splitting your conditions into CASEs:

SELECT sum(sec) FROM (
SELECT 
  CASE 
    WHEN type = 1 THEN
      EXTRACT(SECOND FROM end_time-start_time)
    WHEN type = 2 AND NOT EXISTS (SELECT 1 FROM timings 
                                  WHERE type = 1 AND
                                    (start_time::date = t.start_time::date OR
                                     end_time::date = t.end_time::date)) THEN 0
    WHEN type = 2 AND EXISTS (SELECT 1 FROM timings
                              WHERE start_time < t.start_time AND 
                                    end_time > t.end_time AND
                                    type = 1) THEN        
      EXTRACT(SECOND FROM end_time-start_time) * -1    
  END AS sec
FROM timings t) j;

Demo: db<>fiddle

Extract time points meeting some time interval criteria

Here is one way to solve this, which will work at least as far back as Oracle 10.2. It uses analytic functions and a hierarchical query.

The WITH clause is there just to build the sample data on the fly. You don't need it - remove it, and use your actual table and column names in the query. (In the WITH clause I declared the columns after the CTE name, which works only in Oracle 11.2 and higher, but the WITH clause is not part of the solution, so I wouldn't worry about that.)

with
  sample_data (country, date_id) as (
    select 'USA', 199003 from dual union all
    select 'USA', 200004 from dual union all
    select 'USA', 200005 from dual union all
    select 'USA', 200009 from dual union all
    select 'USA', 200010 from dual union all
    select 'UK' , 199307 from dual union all
    select 'UK' , 199308 from dual union all
    select 'UK' , 199408 from dual
  )
select country, date_id
from   (
         select country, date_id,
                row_number() over (partition by country order by dt) as rn,
                count(*) over (partition by country order by dt
                  range between current row 
                            and interval '4' month following) as ct
         from   (
                  select country, date_id, 
                         to_date(to_char(date_id, 'fm999999'), 'yyyymm') as dt
                  from   sample_data
                )
       )
start with rn = 1
connect by country = prior country and rn = prior rn + prior ct
;

COUNTRY    DATE_ID
------- ----------
UK          199307
UK          199408
USA         199003
USA         200004
USA         200009

For comparison, here is a match_recognize solution, which requires Oracle 12.1 or higher:

select country, date_id
from   (
         select country, date_id, 
                to_date(to_char(date_id, 'fm999999'), 'yyyymm') dt
         from   sample_data
       )
match_recognize(
  partition by country
  order by     date_id
  all rows per match
  pattern      (a {- b* -})
  define b as  dt < add_months(a.dt, 5)
);

SQL to Find Time Elapsed from Multiple Overlapping Intervals