Date Range for Set of Same Data

Date Range for set of same data

Non-relational Solution

I don't think any of other answers are correct.

  • GROUP BY won't work

  • Using ROW_NUMBER() forces the data into a Record Filing System structure, which is physical, and then processes it as physical records. At a massive performance cost. Of course, in order to write such code, it forces you to think in terms of RFS instead of thinking in Relational terms.

  • Using CTEs is the same. Iterating through the data, especially data that does not change. At a slightly different massive cost.

  • Cursors are definitely the wrong thing for a different set of reasons. (a) Cursors require code, and you have requested a View (b) Cursors abandon the set-processing engine, and revert to row-by-row processing. Again, not required. If a developer on any of my teams uses cursors or temp tables on a Relational Database (ie. not a Record Filing System), I shoot them.

Relational Solution

  1. Your data is Relational, logical, the two given data columns are all that is necessary.

  2. Sure, we have to form a View (derived Relation), to obtain the desired report, but that consists of pure SELECTs, which is quite different to processing (converting it to a file, which is physical, and then processing the file; or temp tables; or worktables; or CTEs; or ROW_Number(); etc).

  3. Contrary to the lamentations of "theoreticians", who have an agenda, SQL handles Relational data perfectly well. And you data is Relational.

Therefore, maintain a Relational mindset, a Relational view of the data, and a set-processing mentality. Every report requirement over a Relational Database can be fulfilled using a single SELECT. There is no need to regress to pre-1970 ISAM File handling methods.

I will assume the Primary Key (the set of columns that give a Relational row uniqueness) is Date, and based on the example data given, the Datatype is DATE.

Try this:

    CREATE VIEW MyTable_Base_V          -- Foundation View
AS
SELECT Date,
Date_Next,
Price
FROM (
-- Derived Table: project rows with what we need
SELECT Date,
[Date_Next] = DATEADD( DD, 1, O.Date ),
Price,
[Price_Next] = (

SELECT Price -- NULL if not exists
FROM MyTable
WHERE Date = DATEADD( DD, 1, O.Date )
)

FROM MyTable MT

) AS X
WHERE Price != Price_Next -- exclude unchanging rows
GO

CREATE VIEW MyTable_V -- Requested View
AS
SELECT [Date_From] = (
-- Date of the previous row
SELECT MAX( Date_Next ) -- previous row
FROM MyTable_V
WHERE Date_Next < MT.Date
),

[Date_To] = Date, -- this row
Price
FROM MyTable_Base_V MT
GO

SELECT *
FROM MyTable_V
GO

Method, Generic

Of course this is a method, therefore it is generic, it can be used to determine the From_ and To_ of any data range (here, a Date range), based on any data change (here, a change in Price).

Here, your Dates are consecutive, so the determination of Date_Next is simple: increment the Date by 1 day. If the PK is increasing but not consecutive (eg. DateTime or TimeStamp or some other Key), change the Derived Table X to:

    -- Derived Table: project rows with what we need
SELECT DateTime,
[DateTime_Next] = (
-- first row > this row
SELECT TOP 1
DateTime -- NULL if not exists
FROM MyTable
WHERE DateTime > MT.DateTime
),

Price,
[Price_Next] = (
-- first row > this row
SELECT TOP 1
Price -- NULL if not exists
FROM MyTable
WHERE DateTime > MT.DateTime
)

FROM MyTable MT

Enjoy.

Please feel free to comment, ask questions, etc.

Determine Whether Two Date Ranges Overlap

(StartA <= EndB) and (EndA >= StartB)

Proof:

Let ConditionA Mean that DateRange A Completely After DateRange B

_                        |---- DateRange A ------|
|---Date Range B -----| _

(True if StartA > EndB)

Let ConditionB Mean that DateRange A is Completely Before DateRange B

|---- DateRange A -----|                        _ 
_ |---Date Range B ----|

(True if EndA < StartB)

Then Overlap exists if Neither A Nor B is true -

(If one range is neither completely after the other,

nor completely before the other,
then they must overlap.)

Now one of De Morgan's laws says that:

Not (A Or B) <=> Not A And Not B

Which translates to: (StartA <= EndB) and (EndA >= StartB)


NOTE: This includes conditions where the edges overlap exactly. If you wish to exclude that,

change the >= operators to >, and <= to <


NOTE2. Thanks to @Baodad, see this blog, the actual overlap is least of:

{ endA-startA, endA - startB, endB-startA, endB - startB }

(StartA <= EndB) and (EndA >= StartB)
(StartA <= EndB) and (StartB <= EndA)


NOTE3. Thanks to @tomosius, a shorter version reads:

DateRangesOverlap = max(start1, start2) < min(end1, end2)

This is actually a syntactical shortcut for what is a longer implementation, which includes extra checks to verify that the start dates are on or before the endDates. Deriving this from above:

If start and end dates can be out of order, i.e., if it is possible that startA > endA or startB > endB, then you also have to check that they are in order, so that means you have to add two additional validity rules:

(StartA <= EndB) and (StartB <= EndA) and (StartA <= EndA) and (StartB <= EndB)
or:

(StartA <= EndB) and (StartA <= EndA) and (StartB <= EndA) and (StartB <= EndB)
or,

(StartA <= Min(EndA, EndB) and (StartB <= Min(EndA, EndB))
or:

(Max(StartA, StartB) <= Min(EndA, EndB)

But to implement Min() and Max(), you have to code, (using C ternary for terseness),:

((StartA > StartB) ? StartA : StartB) <= ((EndA < EndB) ? EndA : EndB)

Find the optimal pair of date ranges from a list determined by overlapping days

Pair off all date ranges with a common id and find the ones that have
The longest stretch of common days. Ties are possible so handle those as necessary.

with overlap as (
select p1.id, overlapdays, maxstart, minend,
dense_rank() over (partition by p1.id order by overlapdays desc) as dr
from portfoliodates p1 inner join portfoliodates p2
on p2.id = p1.id and (
p2.startdate > p1.startdate
or p2.startdate = p1.startdate and p2.enddate > p1.enddate
)
cross apply (values (
case when p1.startdate > p2.startdate then p1.startdate else p2.startdate end,
case when p1.enddate < p2.enddate then p1.enddate else p2.enddate end
)) v1(maxstart, minend)
cross apply (values (datediff(day, maxstart, minend))) v2(overlapdays)
where overlapdays >= 0
)
select * from overlap where dr = 1;

https://dbfiddle.uk/LhgVL7z9

How to group data based on continuous date range?

the date range for price is 23.9 is not right because price not same for all the days in that range.

Because there are two same price in different overlapping date ranges, so you might get only one row when you used aggregate function.

This is a gap-and-island problem, we can try to use ROW_NUMBER window function to get the gap of overlapping date and then group by that.

SELECT  Product_Code,
min(Pricing_Date) AS Min_Date ,
max(Pricing_Date) AS Max_Date,
price
FROM (
SELECT *,
ROW_NUMBER() OVER(ORDER BY PRICING_DATE) - ROW_NUMBER() OVER(PARTITION BY PRODUCT_CODE,PRICE ORDER BY PRICING_DATE) grp
FROM PRICE_DATA
) t1
GROUP BY grp,Product_Code,price
ORDER BY min(Pricing_Date)

sqlfiddle

Explain

The gap-and-island problem is a feature

continuous(overlapping) data is that a set (continuous range of sequence) - (values ​​based on a certain order of conditions sequence) yields the same grouping.

so that We can use

  • ROW_NUMBER() OVER(ORDER BY PRICING_DATE) making a continuous range of values.
  • ROW_NUMBER() OVER(PARTITION BY PRODUCT_CODE,PRICE ORDER BY PRICING_DATE) making values ​​based on a certain order of conditions.

Then we will get a grouping column with overlapping data as sqlfiddle

Find common date range from a set of overlapping date ranges

This answer counts the number of overlapping intervals. It assumes that date ranges having same EID do not overlap. Below is the query with explanation inline:

DECLARE @EventsTBL TABLE (PID INT, EID INT, StartDate DATETIME, EndDate DATETIME);
INSERT INTO @EventsTBL VALUES
(13579, 1, '01 Jan 2018', '31 Mar 2019'),
(13579, 2, '01 Feb 2018', '31 May 2018'),
(13579, 2, '01 Jul 2018', '31 Jan 2019'),
(13579, 7, '01 Mar 2018', '31 Mar 2019'),
(13579, 5, '01 Feb 2018', '30 Apr 2018'),
(13579, 5, '01 Oct 2018', '31 Mar 2019'),
(13579, 8, '01 Jan 2018', '30 Apr 2018'),
(13579, 8, '01 Jun 2018', '31 Dec 2018'),
(13579, 13, '01 Jan 2018', '31 Mar 2019'),
(13579, 6, '01 Apr 2018', '31 May 2018'),
(13579, 6, '01 Sep 2018', '30 Nov 2018'),
(13579, 4, '01 Feb 2018', '31 Jan 2019'),
(13579, 19, '01 Mar 2018', '31 Jul 2018'),
(13579, 19, '01 Oct 2018', '28 Feb 2019'),
(13570, 16, '01 Feb 2018', '30 Jun 2018'),
(13570, 16, '01 Aug 2018', '31 Aug 2018'),
(13570, 16, '01 Oct 2018', '28 Feb 2019'),
(13570, 23, '01 Mar 2018', '30 Jun 2018'),
(13570, 23, '01 Nov 2018', '31 Jan 2019');

WITH cte1 AS (
/*
* augment the data with the number of distinct EID per PID
* we will need this later
*/
SELECT e.PID, a.EIDCount, StartDate, EndDate
FROM @EventsTBL AS e
JOIN (
SELECT PID, COUNT(DISTINCT EID) AS EIDCount
FROM @EventsTBL
GROUP BY PID
) AS a ON e.PID = a.PID
), cte2 AS (
/*
* build a list of "points in time" at which an event started or ended
* and the number concurrent events changed
* the zero value rows are required!
*/
SELECT PID, EIDCount, StartDate AS pdate, 1 AS pval
FROM cte1
UNION ALL
SELECT PID, EIDCount, EndDate, 0
FROM cte1
UNION ALL
SELECT PID, EIDCount , DATEADD(DAY, 1, EndDate), -1
FROM cte1
), cte3 AS (
/*
* calculate running sum of pval over dates; minus ones first
*/
SELECT PID, EIDCount, pdate, SUM(pval) OVER (PARTITION BY PID ORDER BY pdate, pval) AS evtcount
FROM cte2
), cte4 AS (
/*
* consolidate data for same dates and we are done with the main part
*/
SELECT PID, EIDCount, pdate, MAX(evtcount) AS evtcount
FROM cte3
GROUP BY PID, EIDCount, pdate
), cte5 AS (
/*
* assign "change flag" to rows where number of concurrent events
* enters or exits the required count w.r.t. previous row
*/
SELECT PID, EIDCount, pdate, evtcount, CASE
WHEN evtcount < EIDCount AND LAG(evtcount) OVER (PARTITION BY PID ORDER BY pdate) < EIDCount THEN 0
WHEN evtcount = EIDCount AND LAG(evtcount) OVER (PARTITION BY PID ORDER BY pdate) = EIDCount THEN 0
ELSE 1
END AS chg
FROM cte4
), cte6 AS (
/*
* convert "change flag" to "group numbers" over consecutive rows using running sum
*/
SELECT PID, EIDCount, pdate, evtcount, SUM(chg) OVER (PARTITION BY PID ORDER BY pdate) AS grp
FROM cte5
)
/*
* group rows by pid and group numbers
*/
SELECT PID, MIN(pdate) AS StartDate, MAX(pdate) AS EndDate
FROM cte6
WHERE evtcount = EIDCount
GROUP BY PID, grp
ORDER BY PID, StartDate

Demo on db<>fiddle

Select data between a date/time range

You need to update the date format:

select * from hockey_stats 
where game_date between '2012-03-11 00:00:00' and '2012-05-11 23:59:00'
order by game_date desc;

Checking date range against set of date ranges in a data table

Using an extension method to check if a Date is between two others,

public static class DateTimeExt {
public static bool Between(this DateTime aDate, DateTime start, DateTime end) => start <= aDate && aDate <= end;
}

You can write an Overlaps method that determines if two ranges overlap:

public static bool Overlaps(DateTime aPeriodStart, DateTime aPeriodEnd, DateTime bPeriodStart, DateTime bPeriodEnd)
=> aPeriodStart.Between(bPeriodStart, bPeriodEnd) ||
aPeriodEnd.Between(bPeriodStart, bPeriodEnd) ||
bPeriodStart.Between(aPeriodStart, aPeriodEnd);

Now with another extension method that converts an IEnumerable<DataRow> to a DataTable containing the rows:

public static class IEnumerableExt {
public static DataTable ToDataTable(this IEnumerable<DataRow> src) {
var ans = src.First().Table.Clone();
foreach (var r in src)
ans.ImportRow(r);
return ans;
}
}

Your final method is straight forward:

DataTable FilterTable(DataTable timeTable, DateTime startDate, DateTime endDate) =>
timeTable.AsEnumerable().Where(period => Overlaps(period.Field<DateTime>("StartDate"), period.Field<DateTime>("EndDate"), startDate, endDate)).ToDataTable();

NOTE: If you don't need the answering DataTable for anything, it would be more efficient to replace .ToDataTable() with .Any() and just have the method return a bool indicating if any overlaps exist.

Determine if the day of a month is in a date range, independent from its year

Update 2

dplyr/data.table independent function

md_in_interval <- function(md, start, end) {
# does the interval cover more than a full year?
# Then any date will fall in this interval and hence the result is TRUE
helper <- (lubridate::year(end) - lubridate::year(start)) > 1

# lubridate time interval
interval <- lubridate::interval(dates_start, dates_end)

# helper dates with month/day combination and start year
my_date1 <- lubridate::mdy(paste0(md, lubridate::year(start)))
# helper dates with month/day combination and end year
my_date2 <- lubridate::mdy(paste0(md, lubridate::year(end)))

# check if month/day combination falls within the interval
out <- my_date1 %within% interval |
my_date2 %within% interval |
helper

return(out)

}

Usage with data.table

library(data.table)
dt <- data.table::as.data.table(df)
dt[, isin := md_in_interval("06-05", dates_start, dates_end)][]

Update

To overcome the issue with when there are more than one year span we could use a helper column:

df %>% 
mutate(across(, ymd),
helper = ifelse(year(dates_end) - year(dates_start) > 1, 1, 0),
interval = interval(dates_start, dates_end)) %>%
mutate(my_date1 = mdy(paste0("07-01-",year(dates_start))),
my_date2 = mdy(paste0("07-01-",year(dates_end)))) %>%
mutate(check = my_date1 %within% interval | my_date2 %within% interval | helper == 1) %>%
select(1,2,7)
  dates_start  dates_end check
1 2009-06-26 2010-01-19 TRUE
2 2010-01-20 2011-06-05 TRUE
3 2011-06-06 2011-06-20 FALSE
4 2011-06-21 2013-04-21 TRUE
5 2013-04-22 2016-02-17 TRUE
6 2016-02-18 2016-08-05 TRUE
7 2016-08-06 2018-05-11 TRUE
8 2018-05-12 2019-10-09 TRUE
9 2019-10-10 2021-10-25 TRUE

First answer:

We could use lubridate for this.

  1. We create an interval with interval then we

  2. we check with %within% wether the day is in interval or not.

  3. Before we have to create a month-day-year of 07-01 element. We do this with mdy(paste0("07-01-",year(dates_start)))

library(dplyr)
library(lubridate)

df %>%
mutate(across(, ymd),
interval = interval(dates_start, dates_end)) %>%
mutate(my_date = mdy(paste0("07-01-",year(dates_start)))) %>%
mutate(check = my_date %within% interval)
  dates_start  dates_end                       interval    my_date check
1 2009-06-26 2010-01-19 2009-06-26 UTC--2010-01-19 UTC 2009-07-01 TRUE
2 2010-01-20 2011-06-05 2010-01-20 UTC--2011-06-05 UTC 2010-07-01 TRUE
3 2011-06-06 2011-06-20 2011-06-06 UTC--2011-06-20 UTC 2011-07-01 FALSE
4 2011-06-21 2013-04-21 2011-06-21 UTC--2013-04-21 UTC 2011-07-01 TRUE
5 2013-04-22 2016-02-17 2013-04-22 UTC--2016-02-17 UTC 2013-07-01 TRUE
6 2016-02-18 2016-08-05 2016-02-18 UTC--2016-08-05 UTC 2016-07-01 TRUE
7 2016-08-06 2018-05-11 2016-08-06 UTC--2018-05-11 UTC 2016-07-01 FALSE
8 2018-05-12 2019-10-09 2018-05-12 UTC--2019-10-09 UTC 2018-07-01 TRUE
9 2019-10-10 2021-10-25 2019-10-10 UTC--2021-10-25 UTC 2019-07-01 FALSE

Add values between a date range

I have recreated what I believe to be your situation. I am using a record for the Holdings.

public class SumDateRange {
record Holdings(long getHoldings) {
}

public static void main(String[] args) {
List<Holdings> listHoldingsDBRecords =
List.of(new Holdings(10), new Holdings(20),
new Holdings(30), new Holdings(140),
new Holdings(150), new Holdings(160));

List<LocalDate> filteredStartDate =
List.of(LocalDate.parse("2022-01-15"),
LocalDate.parse("2022-01-20"),
LocalDate.parse("2022-01-30"),
LocalDate.parse("2022-02-15"),
LocalDate.parse("2022-02-15"),
LocalDate.parse("2022-02-15"));

LocalDate filteredEndDate = LocalDate.parse("2022-01-31");

long sum = 0;
for (int i = 0; i < filteredStartDate.size(); i++) {
if (filteredStartDate.get(i).isBefore(filteredEndDate)) {
sum = sum
+ listHoldingsDBRecords.get(i).getHoldings();
}
}
System.out.println(sum);
}
}

prints

60

Remember that any date before the end date is valid. So perhaps you need a minimum acceptable date and check for within a range:

if (filteredStartDate.get(i).isAfter(minimumAcceptableDate) && 
filteredStartDate.get(i).isBefore(filteredEndDate)) {

But the above depends entirely on your ultimate goal and the data involved.



Related Topics



Leave a reply



Submit