Print Start and End Date in One Row for Continous or Overlaping Date Ranges in Oracle SQL

Merge Contiguous date ranges that are from same id and have same amount

Is this what you're after?

WITH Gaps AS(
SELECT DT.ID,
DT.StartDate,
DT.EndDate,
DT.Amount,
DATEDIFF(DAY,LAG(DATEADD(DAY,1,DT.EndDate),1,DT.StartDate) OVER (PARTITION BY DT.ID, DT.Amount ORDER BY DT.StartDate ASC), DT.StartDate) AS Gap
FROM @DataTable DT),
Grps AS(
SELECT G.ID,
G.StartDate,
G.EndDate,
G.Amount,
ROW_NUMBER() OVER (PARTITION BY G.ID ORDER BY G.StartDate) -
ROW_NUMBER() OVER (PARTITION BY G.ID,Amount ORDER BY G.StartDate) + Gap AS Grp
FROM Gaps G)
SELECT G.ID,
MIN(G.StartDate) AS StartDate,
MAX(EndDate) AS EndDate,
G.Amount
FROM Grps G
GROUP BY G.ID,
G.Amount,
G.Grp
ORDER BY ID,
StartDate;

Added a Unique ID, to get around a "feature":

DECLARE @DataTable TABLE (
UniqueID int IDENTITY(1,1),
ID [int] NULL,
StartDate [date] NULL,
EndDate [date] NULL,
Amount [decimal](12,2) NULL
)

INSERT INTO @DataTable
SELECT 1, '20180101','20180513', 10.00 UNION ALL
SELECT 1, '20180630','20190301', 15.00 UNION ALL
SELECT 1, '20190302','20190615', 15.00 UNION ALL
SELECT 1, '20190616','20991231', 5.00 UNION ALL
SELECT 2, '20190101','20190331', 35.00 UNION ALL
SELECT 2, '20190401','20191031', 30.00 UNION ALL
SELECT 3, '20180505','20180930', 19.00 UNION ALL
SELECT 3, '20181001','20190228', 1.00 UNION ALL
SELECT 3, '20190501','20190815', 1.00 UNION ALL
SELECT 3, '20190819','20190827', 5.00 UNION ALL
SELECT 3, '20190828','20991231', 1.00 UNION ALL
SELECT 4, '20171001', '20171231', 688.96 UNION ALL
SELECT 4, '20180101', '20180430', 707.96 UNION ALL
SELECT 4, '20180501', '20180531', 783.96 UNION ALL
SELECT 4, '20180601', '20181231', 707.96 UNION ALL
SELECT 4, '20190101', '20190331', 707.96 UNION ALL
SELECT 4, '20190401', '20190430', 571.46 UNION ALL
SELECT 4, '20190501', '20190630', 707.96 UNION ALL
SELECT 4, '20190701', '20991231', 707.96;

--SELECT *
--FROM @DataTable;

WITH Gaps AS(
SELECT DT.UniqueID,
DT.ID,
DT.StartDate,
DT.EndDate,
DT.Amount,
DATEDIFF(DAY,LAG(DATEADD(DAY,1,DT.EndDate),1,DT.StartDate) OVER (PARTITION BY DT.ID, DT.Amount ORDER BY DT.UniqueID ASC), DT.StartDate) AS Gap
FROM @DataTable DT),
Grps AS(
SELECT G.UniqueID,
G.ID,
G.StartDate,
G.EndDate,
G.Amount,
G.Gap,
ROW_NUMBER() OVER (PARTITION BY G.ID ORDER BY G.UniqueID) -
ROW_NUMBER() OVER (PARTITION BY G.ID,Amount ORDER BY G.UniqueID) + (Gap * UniqueID) AS Grp
FROM Gaps G)
SELECT G.ID,
MIN(G.StartDate) AS StartDate,
MAX(EndDate) AS EndDate,
G.Amount
FROM Grps G
GROUP BY G.ID,
G.Amount,
G.Grp
ORDER BY ID,
StartDate;

Determine Whether Two Date Ranges Overlap

(StartA <= EndB) and (EndA >= StartB)

Proof:

Let ConditionA Mean that DateRange A Completely After DateRange B

_                        |---- DateRange A ------|
|---Date Range B -----| _

(True if StartA > EndB)

Let ConditionB Mean that DateRange A is Completely Before DateRange B

|---- DateRange A -----|                        _ 
_ |---Date Range B ----|

(True if EndA < StartB)

Then Overlap exists if Neither A Nor B is true -

(If one range is neither completely after the other,

nor completely before the other,
then they must overlap.)

Now one of De Morgan's laws says that:

Not (A Or B) <=> Not A And Not B

Which translates to: (StartA <= EndB) and (EndA >= StartB)


NOTE: This includes conditions where the edges overlap exactly. If you wish to exclude that,

change the >= operators to >, and <= to <


NOTE2. Thanks to @Baodad, see this blog, the actual overlap is least of:

{ endA-startA, endA - startB, endB-startA, endB - startB }

(StartA <= EndB) and (EndA >= StartB)
(StartA <= EndB) and (StartB <= EndA)


NOTE3. Thanks to @tomosius, a shorter version reads:

DateRangesOverlap = max(start1, start2) < min(end1, end2)

This is actually a syntactical shortcut for what is a longer implementation, which includes extra checks to verify that the start dates are on or before the endDates. Deriving this from above:

If start and end dates can be out of order, i.e., if it is possible that startA > endA or startB > endB, then you also have to check that they are in order, so that means you have to add two additional validity rules:

(StartA <= EndB) and (StartB <= EndA) and (StartA <= EndA) and (StartB <= EndB)
or:

(StartA <= EndB) and (StartA <= EndA) and (StartB <= EndA) and (StartB <= EndB)
or,

(StartA <= Min(EndA, EndB) and (StartB <= Min(EndA, EndB))
or:

(Max(StartA, StartB) <= Min(EndA, EndB)

But to implement Min() and Max(), you have to code, (using C ternary for terseness),:

((StartA > StartB) ? StartA : StartB) <= ((EndA < EndB) ? EndA : EndB)

Print non-overlapping date blocks in oracle

Here's a pure SQL solution:

SQL Fiddle

Oracle 11g R2 Schema Setup:

create table sd_ed as with sd_ed(ID, START_DATE, END_DATE) as (
select 1, to_date('01.06.2015','dd.mm.yyyy'), to_date('20.06.2015','dd.mm.yyyy') from dual union all
select 2, to_date('05.06.2015','dd.mm.yyyy'), to_date('25.06.2015','dd.mm.yyyy') from dual union all
select 3, to_date('03.06.2015','dd.mm.yyyy'), to_date('10.06.2015','dd.mm.yyyy') from dual union all
select 4, to_date('07.06.2015','dd.mm.yyyy'), to_date('23.06.2015','dd.mm.yyyy') from dual union all
select 5, to_date('21.06.2015','dd.mm.yyyy'), to_date('30.06.2015','dd.mm.yyyy') from dual union all
select 6, to_date('02.06.2015','dd.mm.yyyy'), to_date('10.06.2015','dd.mm.yyyy') from dual union all
select 7, to_date('05.06.2015','dd.mm.yyyy'), to_date('15.06.2015','dd.mm.yyyy') from dual union all
select 8, to_date('05.06.2015','dd.mm.yyyy'), to_date('08.06.2015','dd.mm.yyyy') from dual union all
select 9, to_date('16.06.2015','dd.mm.yyyy'), to_date('20.06.2015','dd.mm.yyyy') from dual
) select * from sd_ed;

Query 1:

with super as(
-- Select the super ranges. All other ranges are
-- completely contained in at least one super range
select id, start_date, end_date
from sd_ed a
where not exists (select 1
from sd_ed b
where a.id <> b.id
and b.start_date <= a.start_date
and a.end_date <= b.end_date)
), hier(id, start_date, end_date) as (
-- Select all record with a start date not between
-- any other records start and end dates
select id, start_date, end_date from super
where not exists(select 1 from super d1
where d1.id <> super.id
and super.start_date between d1.start_date and d1.end_date
and super.start_date <> d1.start_date)
-- Recursively select records that overlap the current range
-- but with end dates after the end date of the current range
union all
select sd_ed.id
, prev.start_date
, greatest(sd_ed.end_date, prev.end_date)
from hier prev
join sd_ed
on sd_ed.id <> prev.id
and sd_ed.start_date <= prev.end_date
and prev.end_date < sd_ed.end_Date
)
-- Get the max end_date for each start date.
-- Start Dates are already minimum for any range.
select start_date, max(end_Date) end_date
from hier
group by start_date

Results:

|             START_DATE |               END_DATE |
|------------------------|------------------------|
| June, 01 2015 00:00:00 | June, 30 2015 00:00:00 |

For this data set identifying the super ranges isn't really needed since the hierarchical query will handle it just fine, but for larger data sets this initial pruning will reduced the amount of work that needs to be done by the hierarchical query.

How to fill date range gaps Oracle SQL

Try this query, tune to your needs:

WITH steps AS (
SELECT date_from as dt, 1 as step FROM ranges
UNION ALL
SELECT date_to as dt, -1 as step FROM ranges
)
SELECT dt as dt_from,
lead(dt) over (order by dt) as dt_to,
sum(step) over (order by dt) as cnt_ranges
FROM steps;

dt_from | dt_to | cnt_ranges
------------------------+-------------------------+-----------
2021-01-01 00:00:00.000 | 2021-03-27 00:00:00.000 | 1
2021-03-27 00:00:00.000 | 2021-03-31 00:00:00.000 | 2
2021-03-31 00:00:00.000 | 2021-04-27 00:00:00.000 | 1
2021-04-27 00:00:00.000 | 2021-05-01 00:00:00.000 | 0
2021-05-01 00:00:00.000 | 2021-12-31 00:00:00.000 | 1
2021-12-31 00:00:00.000 | | 0

how to group a date column based on date range in oracle

Here's a kick at the can...

Assumptions:

  • you want to be able to switch the groupings to weekly or monthly only
  • the start of the first period will be the first date in the feedback data; intervals will be calculated from this initial date
  • output will show feedback value, time period, count
  • time periods will not overlap so periods will be x -> x + interval - 1 day
  • time of day is not important (time for commented dates is always 00:00:00)

First, create some sample data (100 rows):

drop table product_feedback purge;

create table product_feedback
as
select rownum as slno
, chr(65 + MOD(rownum, 26)) as userid
, lpad(chr(65 + MOD(rownum, 26)), 5, chr(65 + MOD(rownum, 26))) as comments
, trunc(sysdate) + rownum + trunc(dbms_random.value * 10) as commented_date
, case mod(rownum * TRUNC(dbms_random.value * 10), 3)
when 0 then 'positive'
when 1 then 'negative'
when 2 then 'neutral' end as feedback
from dual
connect by level <= 100
;

Here's what my sample data looks like:

select *
from product_feedback
;

SLNO USERID COMMENTS COMMENTED_DATE FEEDBACK
1 B BBBBB 2016-08-06 neutral
2 C CCCCC 2016-08-06 negative
3 D DDDDD 2016-08-14 positive
4 E EEEEE 2016-08-16 negative
5 F FFFFF 2016-08-09 negative
6 G GGGGG 2016-08-14 positive
7 H HHHHH 2016-08-17 positive
8 I IIIII 2016-08-18 positive
9 J JJJJJ 2016-08-12 positive
10 K KKKKK 2016-08-15 neutral
11 L LLLLL 2016-08-23 neutral
12 M MMMMM 2016-08-19 positive
13 N NNNNN 2016-08-16 neutral
...

Now for the fun part. Here's the gist:

  • find out what the earliest and latest commented dates are in the data
  • include a query where you can set the time period (to "WEEKS" or "MONTHS")
  • generate all of the (weekly or monthly) time periods between the min/max dates
  • join the product feedback to the time periods (commented date between start and end) with an outer join in case you want to see all time periods whether or not there was any feedback
  • group the joined result by feedback, period start, and period end, and set up a column to count one of the 3 possible feedback values

x

with 
min_max_dates -- get earliest and latest feedback dates
as
(select min(commented_date) min_date, max(commented_date) max_date
from product_feedback
)
, time_period_interval
as
(select 'MONTHS' as tp_interval -- set the interval/time period here
from dual
)
, -- generate all time periods between the start date and end date
time_periods (start_of_period, end_of_period, max_date, time_period) -- recursive with clause - fun stuff!
as
(select mmd.min_date as start_of_period
, CASE WHEN tpi.tp_interval = 'WEEKS'
THEN mmd.min_date + 7
WHEN tpi.tp_interval = 'MONTHS'
THEN ADD_MONTHS(mmd.min_date, 1)
ELSE NULL
END - 1 as end_of_period
, mmd.max_date
, tpi.tp_interval as time_period
from time_period_interval tpi
cross join
min_max_dates mmd
UNION ALL
select CASE WHEN time_period = 'WEEKS'
THEN start_of_period + 7 * (ROWNUM )
WHEN time_period = 'MONTHS'
THEN ADD_MONTHS(start_of_period, ROWNUM)
ELSE NULL
END as start_of_period
, CASE WHEN time_period = 'WEEKS'
THEN start_of_period + 7 * (ROWNUM + 1)
WHEN time_period = 'MONTHS'
THEN ADD_MONTHS(start_of_period, ROWNUM + 1)
ELSE NULL
END - 1 as end_of_period
, max_date
, time_period
from time_periods
where end_of_period <= max_date
)
-- now put it all together
select pf.feedback
, tp.start_of_period
, tp.end_of_period
, count(*) as feedback_count
from time_periods tp
left outer join
product_feedback pf
on pf.commented_date between tp.start_of_period and tp.end_of_period
group by tp.start_of_period
, tp.end_of_period
, pf.feedback
order by pf.feedback
, tp.start_of_period
;

Output:

negative    2016-08-06  2016-09-05  6
negative 2016-09-06 2016-10-05 7
negative 2016-10-06 2016-11-05 8
negative 2016-11-06 2016-12-05 1
neutral 2016-08-06 2016-09-05 6
neutral 2016-09-06 2016-10-05 5
neutral 2016-10-06 2016-11-05 11
neutral 2016-11-06 2016-12-05 2
positive 2016-08-06 2016-09-05 17
positive 2016-09-06 2016-10-05 16
positive 2016-10-06 2016-11-05 15
positive 2016-11-06 2016-12-05 6

-- EDIT --

New and improved, all in one easy to use procedure. (I will assume you can configure the procedure to make use of the query in whatever way you need.) I made some changes to simplify the CASE statements in a few places and note that for whatever reason using a LEFT OUTER JOIN in the main SELECT results in an ORA-600 error for me so I switched it to INNER JOIN.

CREATE OR REPLACE PROCEDURE feedback_counts(p_days_chosen IN NUMBER, p_cursor OUT SYS_REFCURSOR)
AS
BEGIN

OPEN p_cursor FOR
with
min_max_dates -- get earliest and latest feedback dates
as
(select min(commented_date) min_date, max(commented_date) max_date
from product_feedback
)
, time_period_interval
as
(select CASE
WHEN p_days_chosen BETWEEN 1 AND 10 THEN 'DAYS'
WHEN p_days_chosen > 10 AND p_days_chosen <=31 THEN 'WEEKS'
WHEN p_days_chosen > 31 AND p_days_chosen <= 365 THEN 'MONTHS'
ELSE '3-MONTHS'
END as tp_interval -- set the interval/time period here
from dual --(SELECT p_days_chosen as days_chosen from dual)
)
, -- generate all time periods between the start date and end date
time_periods (start_of_period, end_of_period, max_date, tp_interval) -- recursive with clause - fun stuff!
as
(select mmd.min_date as start_of_period
, CASE tpi.tp_interval
WHEN 'DAYS'
THEN mmd.min_date + 1
WHEN 'WEEKS'
THEN mmd.min_date + 7
WHEN 'MONTHS'
THEN mmd.min_date + 30
WHEN '3-MONTHS'
THEN mmd.min_date + 90
ELSE NULL
END - 1 as end_of_period
, mmd.max_date
, tpi.tp_interval
from time_period_interval tpi
cross join
min_max_dates mmd
UNION ALL
select CASE tp_interval
WHEN 'DAYS'
THEN start_of_period + 1 * ROWNUM
WHEN 'WEEKS'
THEN start_of_period + 7 * ROWNUM
WHEN 'MONTHS'
THEN start_of_period + 30 * ROWNUM
WHEN '3-MONTHS'
THEN start_of_period + 90 * ROWNUM
ELSE NULL
END as start_of_period
, start_of_period
+ CASE tp_interval
WHEN 'DAYS'
THEN 1
WHEN 'WEEKS'
THEN 7
WHEN 'MONTHS'
THEN 30
WHEN '3-MONTHS'
THEN 90
ELSE NULL
END * (ROWNUM + 1)
- 1 as end_of_period
, max_date
, tp_interval
from time_periods
where end_of_period <= max_date
)
-- now put it all together
select pf.feedback
, tp.start_of_period
, tp.end_of_period
, count(*) as feedback_count
from time_periods tp
inner join -- currently a bug that prevents the procedure from compiling with a LEFT OUTER JOIN
product_feedback pf
on pf.commented_date between tp.start_of_period and tp.end_of_period
group by tp.start_of_period
, tp.end_of_period
, pf.feedback
order by tp.start_of_period
, pf.feedback
;
END;

Test the procedure (in something like SQLPlus or SQL Developer):

var x refcursor
exec feedback_counts(10, :x)
print :x

Detect consecutive dates ranges using SQL

No joins or recursive CTEs needed. The standard gaps-and-island solution is to group by (value minus row_number), since that is invariant within a consecutive sequence. The start and end dates are just the MIN() and MAX() of the group.

WITH t AS (
SELECT InfoDate d,ROW_NUMBER() OVER(ORDER BY InfoDate) i
FROM @d
GROUP BY InfoDate
)
SELECT MIN(d),MAX(d)
FROM t
GROUP BY DATEDIFF(day,i,d)

Detect overlapping date ranges from the same table

If you already have entries for each day that should work, but if you don't the overhead is significant, and if that query is used often, if will affect performance.

If the data is in this format, you can detect overlaps using simple date arithmetic, because an overlap is simply one interval starting after a given interval, but before the given is finished, something like

select dr1.* from date_ranges dr1
inner join date_ranges dr2
on dr2.start > dr1.start -- start after dr1 is started
and dr2.start < dr1.end -- start before dr1 is finished

If you need special handling for interval that are wholly within another interval, or you need to merge intervals, i.e.

PKey  Start       End         Type
==== ===== === ====
01 01/01/2010 20/01/2010 S
02 15/01/2010 31/01/2010 S

yielding

Start       End         Type
===== === ====
01/01/2010 31/01/2010 S

you will need more complex calculation.

In my experience with this kind of problems, once you get how to do the calculation by hand, it's easy to transfer it into SQL :)

Oracle Merging Two Tables With Date Gaps

Since the rockstars are busy leading their debauched (if well-earned) lifestyles and the ninjas look like they'll be busy for a while, I'll have a go...

The way you've laid it out, you want to collapse contiguous records in TableA first and use that result against a (possibly collapsed) TableB. I'm not sure doing that as a separate step is ideal for solving the overall problem, but I'll go with it for now. The general way I've found simplest to collapse the rows is something like:

select id, locn, max(start_date) as start_date, max(end_date) as end_date
from (
select id, locn,
case when start_date = lag_end_date + interval '1' day then null
else start_date end as start_date,
case when end_date = lead_start_date - interval '1' day then null
else end_date end as end_date,
row_number() over (partition by id order by start_date)
- row_number() over (partition by id, locn
order by start_date) as chain
from (
select id, locn, start_date, end_date,
lead(start_date) over (partition by id, locn
order by start_date) as lead_start_date,
lag(end_date) over (partition by id, locn
order by start_date) as lag_end_date
from TableA
)
)
group by id, locn, chain
order by 1, 3, 2;

ID LOCN START_DATE END_DATE
---------- ---------- ---------- ----------
1P1 01 02/04/1996 05/04/2005
1P2 02 02/07/1996 01/01/2099
1P2 30 01/31/1996 02/06/1996
1P3 03 02/07/1996 01/01/2099
1S4 42 11/06/2001 01/01/2099
3S4 42 11/06/2001 01/01/2099

The innermost select uses lead and lag to peek at adjacent rows (and you hinted at that in your previous question).

The next layer sets contiguous values (i.e. where one row's start date is the day after the previous row's end date) to null; if you run just that part you'll see the contiguous range start and ends emerge. It also adds a chain pseudocolumn which lets it cope with an id switching back to a previously-used locn; say having 1P2 go back to locn=30. (This is an approach I originally saw here, but see more about gaps and islands too). Without that all 'islands' of an id/locn would be treated as one block and you'd end up with overlapping date ranges.

The outer layer user min and max to remove the nulls and produce the final result.

Using that you could - if you're on 11gR2 - use a recursive CTE to join recursively to get all the combinations. This is only my second real stab at one of these so others may well point out flaws or improvements, if they can tear themselves away from their M&Ms... might give you some pointers though.

with a as (
select id, locn, max(start_date) as start_date, max(end_date) as end_date
from (
select id, locn,
case when start_date = lag_end_date + interval '1' day then null
else start_date end as start_date,
case when end_date = lead_start_date - interval '1' day then null
else end_date end as end_date,
row_number() over (partition by id order by start_date)
- row_number() over (partition by id, locn
order by start_date) as chain
from (
select id, locn, start_date, end_date,
lead(start_date) over (partition by id, locn
order by start_date) as lead_start_date,
lag(end_date) over (partition by id, locn
order by start_date) as lag_end_date
from TableA
)
)
group by id, locn, chain
),
b as (
select id, posting, description, other_id, start_date, end_date,
row_number() over (partition by id, posting, description,
other_id order by start_date, end_date) as rn
from TableB
),
r (id, posting, description, other_id, rn, start_date, end_date, locn) as (
select b.id, b.posting, b.description, b.other_id, b.rn,
b.start_date,
case
when not (a.start_date > b.end_date or a.end_date < b.start_date)
and a.start_date <= b.end_date and a.end_date < b.end_date
then a.end_date
when not (a.start_date > b.end_date or a.end_date < b.start_date)
and a.start_date <= b.end_date and a.start_date > b.start_date
then a.start_date - interval '1' day
else b.end_date
end as end_date,
case
when a.start_date <= b.start_date and a.end_date >= b.start_date
then a.locn
end
from b
left join (
select id, locn, start_date, end_date,
row_number() over (partition by id order by start_date) as rn
from a
) a on a.id = b.id
and a.rn = 1
union all
select b.id, b.posting, b.description, b.other_id, b.rn,
case
when a.start_date is null then r.end_date + interval '1' day
else a.start_date
end as start_date,
case
when a.start_date is null then b.end_date
when not (a.start_date > r.end_date or a.end_date < r.start_date)
then least(a.end_date, b.end_date)
when a.end_date < b.end_date then a.start_date - interval '1' day
else b.end_date
end as end_date,
a.locn
from b
join r on r.id = b.id
and r.posting = b.posting
and r.description = b.description
and r.other_id = b.other_id
and r.rn = b.rn
and r.start_date = b.start_date
and r.end_date < b.end_date
left join a on a.id = r.id
and a.start_date > r.end_date
)
select id, posting as unit_type, description, other_id,
start_date, end_date, locn
from r
order by id, start_date;

This gets the result you wanted I believe:

ID         UNIT_TYPE            DESCRIPTION                    OTHER_ID   START_DATE END_DATE   LOCN
---------- -------------------- ------------------------------ ---------- ---------- ---------- ----------
1P1 PROFESSOR Sch 1 Quad 1 Area P1 02/04/1996 05/04/2005 01
1P1 PROFESSOR Sch 1 Quad 1 Area P1 05/05/2005 01/01/2099
1P2 PROFESSOR Sch 1 Quad 2 Area P2 01/31/1996 02/06/1996 30
1P2 PROFESSOR Sch 1 Quad 2 Area P2 02/07/1996 01/01/2099 02
1P3 PROFESSOR Sch 1 Quad 3 Area P3 02/05/1996 02/06/1996
1P3 PROFESSOR Sch 1 Quad 3 Area P3 02/07/1996 01/01/2099 03
1S4 SUPERVISOR Sch 1 CO Supervisor 4 1S4 02/05/1996 11/05/2001
1S4 SUPERVISOR Sch 1 CO Supervisor 4 1S4 11/06/2001 03/18/2002 42
1S4 SUPERINTENDENT Sch 1 CD Superintendent 1S4 03/19/2002 06/09/2009 42
1S4 SUPERVISOR Sch 1 CO Supervisor 4 1S4 06/10/2009 01/01/2099 42
2S5 SUPERVISOR Sch 2 CAO Supervisor 5 2S5 10/26/2002 06/09/2009
2S5 SUPERINTENDENT Sch 2 CAO Superintendent 5 2S5 06/10/2009 07/14/2009
2S5 SUPERINTENDENT Sch 2 CAO Superintendent 5 S5 07/15/2009 01/01/2099
3S4 SUPERVISOR Sch 3 CO Supervisor 4 3S4 02/05/1996 11/05/2001
3S4 SUPERVISOR Sch 3 CO Supervisor 4 3S4 11/06/2001 03/18/2002 42
3S4 SUPERINTENDENT Sch 3 CD Superintendent 3S4 03/19/2002 06/09/2009 42
3S4 SUPERVISOR Sch 3 CO Supervisor 4 3S4 06/10/2009 01/01/2099 42

17 rows selected.

This is using three CTEs. a is as described above, a collapsed version of TableA. b is TableB but with the addition of a row number column I think I need later to keep records in step during the recursion. r is where the fun starts.

The first part of r generates the initial data for each TableB entry, with matching values from TableA if appropriate - but only from the first matching record if there might be more than one. The tricky bit here is figuring out what end_date should be. If there is no overlapping TableA record at all then it can just be the TableB end date; if there is but it starts after the TableB record then this needs to end immediately before the TableA kicks in. Otherwise it depends if the TableA record ends before or after the TableB one.

Running just that part:

with a as (...), b as (...)
select b.id, b.posting, b.description, b.other_id, b.rn,
b.start_date,
case
when not (a.start_date > b.end_date or a.end_date < b.start_date)
and a.start_date <= b.end_date and a.end_date < b.end_date
then a.end_date
when not (a.start_date > b.end_date or a.end_date < b.start_date)
and a.start_date <= b.end_date and a.start_date > b.start_date
then a.start_date - interval '1' day
else b.end_date
end as end_date,
case
when a.start_date <= b.start_date and a.end_date >= b.start_date
then a.locn
end
from b
left join (
select id, locn, start_date, end_date,
row_number() over (partition by id order by start_date) as rn
from a
) a on a.id = b.id
and a.rn = 1
order by id, start_date;

... gives this (description suppressed for readability):

ID         UNIT_TYPE            OTHER_ID   START_DATE END_DATE   LOCN
---------- -------------------- ---------- ---------- ---------- ----------
1P1 PROFESSOR P1 02/04/1996 05/04/2005 01
1P2 PROFESSOR P2 01/31/1996 02/06/1996 30
1P3 PROFESSOR P3 02/05/1996 02/06/1996
1S4 SUPERVISOR 1S4 02/05/1996 11/05/2001
1S4 SUPERINTENDENT 1S4 03/19/2002 06/09/2009 42
1S4 SUPERVISOR 1S4 06/10/2009 01/01/2099 42
2S5 SUPERVISOR 2S5 10/26/2002 06/09/2009
2S5 SUPERINTENDENT 2S5 06/10/2009 07/14/2009
2S5 SUPERINTENDENT S5 07/15/2009 01/01/2099
3S4 SUPERVISOR 3S4 02/05/1996 11/05/2001
3S4 SUPERINTENDENT 3S4 03/19/2002 06/09/2009 42
3S4 SUPERVISOR 3S4 06/10/2009 01/01/2099 42

12 rows selected.

For IP3, there is initially no matching TableA record, but notice that end_date is set to the day before the one that does match later starts.



Related Topics



Leave a reply



Submit