Merging date intervals in SQL Server
It takes longer for me to set up the sample data than to write the query - it would be better if you posted questions that include CREATE TABLE
and INSERT/SELECT
statements. I don't know what your table is called, I've called mine Periods:
create table Periods (
StartDate date not null,
EndDate date not null
)
go
insert into Periods(StartDate,EndDate)
select '19820302','19820930' union all
select '19821001','19850117' union all
select '19850626','19850726' union all
select '19850730','19911231' union all
select '19920101','19951231' union all
select '19960101','20040531' union all
select '20040605','20060131' union all
select '20060201','20110520'
go
; with MergedPeriods as (
Select p1.StartDate, p1.EndDate
from
Periods p1
left join
Periods p2
on
p1.StartDate = DATEADD(day,1,p2.EndDate)
where
p2.StartDate is null
union all
select p1.StartDate,p2.EndDate
from
MergedPeriods p1
inner join
Periods p2
on
p1.EndDate = DATEADD(day,-1,p2.StartDate)
)
select StartDate,MAX(EndDate) as EndDate
from MergedPeriods group by StartDate
Result:
StartDate EndDate
1982-03-02 1985-01-17
1985-06-26 1985-07-26
1985-07-30 2004-05-31
2004-06-05 2011-05-20
Merge overlapping dates in SQL Server
SQL DEMO
declare @t table (Name varchar(100), Datetime_Start datetime, Datetime_End datetime);
insert into @t values
('A' , '2017-01-02 00:00' , '2017-03-28 00:10'),
('A' , '2017-05-14 23:50' , '2017-05-29 23:50'),
('B' , '2017-05-18 00:00' , '2017-05-18 04:00'),
('B' , '2017-05-18 02:00' , '2017-05-18 03:00'),
('C' , '2017-01-02 00:00' , '2017-01-17 15:50'),
('C' , '2017-01-14 03:50' , '2017-01-28 15:50');
with Datetime_Starts as
(
select distinct name, Datetime_Start
from @t as t1
where not exists
(select * from @t as t2
where t2.name = t1.name
and t2.Datetime_Start < t1.Datetime_Start
and t2.Datetime_End >= t1.Datetime_Start)
),
Datetime_Ends as
(
select distinct name, Datetime_End
from @t as t1
where not exists
(select * from @t as t2
where t2.name = t1.name
and t2.Datetime_End > t1.Datetime_End
and t2.Datetime_Start <= t1.Datetime_End)
)
select name, Datetime_Start,
(select min(Datetime_End)
from Datetime_Ends as e
where e.name = s.name
and Datetime_End >= Datetime_Start) as Datetime_End
from Datetime_Starts as s;
merge consecutive dates ranges using SQL
This is a Gaps & Islands problem. You can use the typical solution using LAG()
. For example:
select
max(client_id) as client_id,
max(status) as status,
min(start_date) as start_date,
max(end_date) as end_date
from (
select *, sum(i) over(partition by client_id order by start_date) as g
from (
select *,
case when dateadd(day, -1, start_date) <>
lag(end_date) over(partition by client_id order by start_date)
then 1 else 0 end as i
from t
) x
) y
group by client_id, g
order by client_id, g
Result:
client_id status start_date end_date
---------- ------- ----------- ----------
1 1 2022-01-01 2022-01-04
1 1 2022-01-12 2022-01-15
2 1 2022-01-03 2022-01-03
2 1 2022-01-05 2022-01-06
See running example at db<>fiddle.
Combine consecutive date ranges
The strange bit you see with my use of the date '31211231' is just a very large date to handle your "no-end-date" scenario. I have assumed you won't really have many date ranges per employee, so I've used a simple Recursive Common Table Expression to combine the ranges.
To make it run faster, the starting anchor query keeps only those dates that will not link up to a prior range (per employee). The rest is just tree-walking the date ranges and growing the range. The final GROUP BY keeps only the largest date range built up per starting ANCHOR (employmentid, startdate) combination.
SQL Fiddle
MS SQL Server 2008 Schema Setup:
create table Tbl (
employmentid int,
startdate datetime,
enddate datetime);
insert Tbl values
(5, '2007-12-03', '2011-08-26'),
(5, '2013-05-02', null),
(30, '2006-10-02', '2011-01-16'),
(30, '2011-01-17', '2012-08-12'),
(30, '2012-08-13', null),
(66, '2007-09-24', null);
/*
-- expected outcome
EmploymentId StartDate EndDate
5 2007-12-03 2011-08-26
5 2013-05-02 NULL
30 2006-10-02 NULL
66 2007-09-24 NULL
*/
Query 1:
;with cte as (
select a.employmentid, a.startdate, a.enddate
from Tbl a
left join Tbl b on a.employmentid=b.employmentid and a.startdate-1=b.enddate
where b.employmentid is null
union all
select a.employmentid, a.startdate, b.enddate
from cte a
join Tbl b on a.employmentid=b.employmentid and b.startdate-1=a.enddate
)
select employmentid,
startdate,
nullif(max(isnull(enddate,'32121231')),'32121231') enddate
from cte
group by employmentid, startdate
order by employmentid
Results:
| EMPLOYMENTID | STARTDATE | ENDDATE |
-----------------------------------------------------------------------------------
| 5 | December, 03 2007 00:00:00+0000 | August, 26 2011 00:00:00+0000 |
| 5 | May, 02 2013 00:00:00+0000 | (null) |
| 30 | October, 02 2006 00:00:00+0000 | (null) |
| 66 | September, 24 2007 00:00:00+0000 | (null) |
Combine continuous datetime intervals by type
Since your ranges are continuous, the problem essentially becomes a gaps-and-islands one. If only you had a criterion to help you to distinguish between different sequences with the same t
value, you could group all the rows using that criterion, then just take MIN(s), MAX(e)
for every group.
One method of obtaining such a criterion is to use two ROW_NUMBER
calls. Consider the following query:
SELECT
*,
rnk1 = ROW_NUMBER() OVER ( ORDER BY s),
rnk2 = ROW_NUMBER() OVER (PARTITION BY t ORDER BY s)
FROM @periods
;
For your example it would return the following set:
s e t rnk1 rnk2
---------- ---------- -- ---- ----
2013-01-01 2013-01-02 3 1 1
2013-01-02 2013-01-04 1 2 1
2013-01-04 2013-01-05 1 3 2
2013-01-05 2013-01-06 2 4 1
2013-01-06 2013-01-07 2 5 2
2013-01-07 2013-01-08 2 6 3
2013-01-08 2013-01-09 1 7 3
The interesting thing about the rnk1
and rnk2
rankings is that if you subtract one from the other, you will get values that, together with t
, uniquely identify every distinct sequence of rows with the same t
:
s e t rnk1 rnk2 rnk1 - rnk2
---------- ---------- -- ---- ---- -----------
2013-01-01 2013-01-02 3 1 1 0
2013-01-02 2013-01-04 1 2 1 1
2013-01-04 2013-01-05 1 3 2 1
2013-01-05 2013-01-06 2 4 1 3
2013-01-06 2013-01-07 2 5 2 3
2013-01-07 2013-01-08 2 6 3 3
2013-01-08 2013-01-09 1 7 3 4
Knowing that, you can easily apply grouping and aggregation. This is what the final query might look like:
WITH partitioned AS (
SELECT
*,
g = ROW_NUMBER() OVER ( ORDER BY s)
- ROW_NUMBER() OVER (PARTITION BY t ORDER BY s)
FROM @periods
)
SELECT
s = MIN(s),
e = MAX(e),
t
FROM partitioned
GROUP BY
t,
g
;
If you like, you can play with this solution at SQL Fiddle.
Related Topics
Remove the Last Character in a String in T-Sql
How to List Active Connections on Postgresql
Not Deferrable Versus Deferrable Initially Immediate
Select Only Some Columns from a Table on a Join
Count of Non-Null Columns in Each Row
Case Statement in SQL, How to Return Multiple Variables
Move SQL Server 2008 Database Files to a New Folder Location
Impact of Defining Varchar2 Column with Greater Length
Difference Between Query, Native Query, Named Query and Typed Query
Select Multiple Columns from a Table, But Group by One
How to Get the Active Users Connected to a Postgresql Database via SQL