Date Range for set of same data
Non-relational Solution
I don't think any of other answers are correct.
GROUP BY
won't workUsing
ROW_NUMBER()
forces the data into a Record Filing System structure, which is physical, and then processes it as physical records. At a massive performance cost. Of course, in order to write such code, it forces you to think in terms of RFS instead of thinking in Relational terms.Using CTEs is the same. Iterating through the data, especially data that does not change. At a slightly different massive cost.
Cursors are definitely the wrong thing for a different set of reasons. (a) Cursors require code, and you have requested a View (b) Cursors abandon the set-processing engine, and revert to row-by-row processing. Again, not required. If a developer on any of my teams uses cursors or temp tables on a Relational Database (ie. not a Record Filing System), I shoot them.
Relational Solution
Your data is Relational, logical, the two given data columns are all that is necessary.
Sure, we have to form a View (derived Relation), to obtain the desired report, but that consists of pure SELECTs, which is quite different to processing (converting it to a file, which is physical, and then processing the file; or temp tables; or worktables; or CTEs; or ROW_Number(); etc).
Contrary to the lamentations of "theoreticians", who have an agenda, SQL handles Relational data perfectly well. And you data is Relational.
Therefore, maintain a Relational mindset, a Relational view of the data, and a set-processing mentality. Every report requirement over a Relational Database can be fulfilled using a single SELECT. There is no need to regress to pre-1970 ISAM File handling methods.
I will assume the Primary Key (the set of columns that give a Relational row uniqueness) is Date,
and based on the example data given, the Datatype is DATE.
Try this:
CREATE VIEW MyTable_Base_V -- Foundation View
AS
SELECT Date,
Date_Next,
Price
FROM (
-- Derived Table: project rows with what we need
SELECT Date,
[Date_Next] = DATEADD( DD, 1, O.Date ),
Price,
[Price_Next] = (
SELECT Price -- NULL if not exists
FROM MyTable
WHERE Date = DATEADD( DD, 1, O.Date )
)
FROM MyTable MT
) AS X
WHERE Price != Price_Next -- exclude unchanging rows
GO
CREATE VIEW MyTable_V -- Requested View
AS
SELECT [Date_From] = (
-- Date of the previous row
SELECT MAX( Date_Next ) -- previous row
FROM MyTable_V
WHERE Date_Next < MT.Date
),
[Date_To] = Date, -- this row
Price
FROM MyTable_Base_V MT
GO
SELECT *
FROM MyTable_V
GO
Method, Generic
Of course this is a method, therefore it is generic, it can be used to determine the From_
and To_
of any data range (here, a Date
range), based on any data change (here, a change in Price
).
Here, your Dates
are consecutive, so the determination of Date_Next
is simple: increment the Date
by 1 day. If the PK is increasing but not consecutive (eg. DateTime
or TimeStamp
or some other Key), change the Derived Table X
to:
-- Derived Table: project rows with what we need
SELECT DateTime,
[DateTime_Next] = (
-- first row > this row
SELECT TOP 1
DateTime -- NULL if not exists
FROM MyTable
WHERE DateTime > MT.DateTime
),
Price,
[Price_Next] = (
-- first row > this row
SELECT TOP 1
Price -- NULL if not exists
FROM MyTable
WHERE DateTime > MT.DateTime
)
FROM MyTable MT
Enjoy.
Please feel free to comment, ask questions, etc.
Determine Whether Two Date Ranges Overlap
(StartA <= EndB) and (EndA >= StartB)
Proof:
Let ConditionA Mean that DateRange A Completely After DateRange B
_ |---- DateRange A ------|
|---Date Range B -----| _
(True if StartA > EndB
)
Let ConditionB Mean that DateRange A is Completely Before DateRange B
|---- DateRange A -----| _
_ |---Date Range B ----|
(True if EndA < StartB
)
Then Overlap exists if Neither A Nor B is true -
(If one range is neither completely after the other,
nor completely before the other,
then they must overlap.)
Now one of De Morgan's laws says that:
Not (A Or B)
<=> Not A And Not B
Which translates to: (StartA <= EndB) and (EndA >= StartB)
NOTE: This includes conditions where the edges overlap exactly. If you wish to exclude that,
change the >=
operators to >
, and <=
to <
NOTE2. Thanks to @Baodad, see this blog, the actual overlap is least of:
{ endA-startA
, endA - startB
, endB-startA
, endB - startB
}
(StartA <= EndB) and (EndA >= StartB)
(StartA <= EndB) and (StartB <= EndA)
NOTE3. Thanks to @tomosius, a shorter version reads:DateRangesOverlap = max(start1, start2) < min(end1, end2)
This is actually a syntactical shortcut for what is a longer implementation, which includes extra checks to verify that the start dates are on or before the endDates. Deriving this from above:
If start and end dates can be out of order, i.e., if it is possible that startA > endA
or startB > endB
, then you also have to check that they are in order, so that means you have to add two additional validity rules:(StartA <= EndB) and (StartB <= EndA) and (StartA <= EndA) and (StartB <= EndB)
or:(StartA <= EndB) and (StartA <= EndA) and (StartB <= EndA) and (StartB <= EndB)
or,(StartA <= Min(EndA, EndB) and (StartB <= Min(EndA, EndB))
or:(Max(StartA, StartB) <= Min(EndA, EndB)
But to implement Min()
and Max()
, you have to code, (using C ternary for terseness),:((StartA > StartB) ? StartA : StartB) <= ((EndA < EndB) ? EndA : EndB)
Find the optimal pair of date ranges from a list determined by overlapping days
Pair off all date ranges with a common id and find the ones that have
The longest stretch of common days. Ties are possible so handle those as necessary.
with overlap as (
select p1.id, overlapdays, maxstart, minend,
dense_rank() over (partition by p1.id order by overlapdays desc) as dr
from portfoliodates p1 inner join portfoliodates p2
on p2.id = p1.id and (
p2.startdate > p1.startdate
or p2.startdate = p1.startdate and p2.enddate > p1.enddate
)
cross apply (values (
case when p1.startdate > p2.startdate then p1.startdate else p2.startdate end,
case when p1.enddate < p2.enddate then p1.enddate else p2.enddate end
)) v1(maxstart, minend)
cross apply (values (datediff(day, maxstart, minend))) v2(overlapdays)
where overlapdays >= 0
)
select * from overlap where dr = 1;
https://dbfiddle.uk/LhgVL7z9
How to group data based on continuous date range?
the date range for price is 23.9 is not right because price not same for all the days in that range.
Because there are two same price
in different overlapping date ranges, so you might get only one row when you used aggregate function.
This is a gap-and-island problem, we can try to use ROW_NUMBER
window function to get the gap of overlapping date and then group by
that.
SELECT Product_Code,
min(Pricing_Date) AS Min_Date ,
max(Pricing_Date) AS Max_Date,
price
FROM (
SELECT *,
ROW_NUMBER() OVER(ORDER BY PRICING_DATE) - ROW_NUMBER() OVER(PARTITION BY PRODUCT_CODE,PRICE ORDER BY PRICING_DATE) grp
FROM PRICE_DATA
) t1
GROUP BY grp,Product_Code,price
ORDER BY min(Pricing_Date)
sqlfiddle
Explain
The gap-and-island problem is a feature
continuous(overlapping) data is that a set
(continuous range of sequence) - (values based on a certain order of conditions sequence)
yields the same grouping.
so that We can use
ROW_NUMBER() OVER(ORDER BY PRICING_DATE)
making a continuous range of values.ROW_NUMBER() OVER(PARTITION BY PRODUCT_CODE,PRICE ORDER BY PRICING_DATE)
making values based on a certain order of conditions.
Then we will get a grouping column with overlapping data as sqlfiddle
Find common date range from a set of overlapping date ranges
This answer counts the number of overlapping intervals. It assumes that date ranges having same EID do not overlap. Below is the query with explanation inline:
DECLARE @EventsTBL TABLE (PID INT, EID INT, StartDate DATETIME, EndDate DATETIME);
INSERT INTO @EventsTBL VALUES
(13579, 1, '01 Jan 2018', '31 Mar 2019'),
(13579, 2, '01 Feb 2018', '31 May 2018'),
(13579, 2, '01 Jul 2018', '31 Jan 2019'),
(13579, 7, '01 Mar 2018', '31 Mar 2019'),
(13579, 5, '01 Feb 2018', '30 Apr 2018'),
(13579, 5, '01 Oct 2018', '31 Mar 2019'),
(13579, 8, '01 Jan 2018', '30 Apr 2018'),
(13579, 8, '01 Jun 2018', '31 Dec 2018'),
(13579, 13, '01 Jan 2018', '31 Mar 2019'),
(13579, 6, '01 Apr 2018', '31 May 2018'),
(13579, 6, '01 Sep 2018', '30 Nov 2018'),
(13579, 4, '01 Feb 2018', '31 Jan 2019'),
(13579, 19, '01 Mar 2018', '31 Jul 2018'),
(13579, 19, '01 Oct 2018', '28 Feb 2019'),
(13570, 16, '01 Feb 2018', '30 Jun 2018'),
(13570, 16, '01 Aug 2018', '31 Aug 2018'),
(13570, 16, '01 Oct 2018', '28 Feb 2019'),
(13570, 23, '01 Mar 2018', '30 Jun 2018'),
(13570, 23, '01 Nov 2018', '31 Jan 2019');
WITH cte1 AS (
/*
* augment the data with the number of distinct EID per PID
* we will need this later
*/
SELECT e.PID, a.EIDCount, StartDate, EndDate
FROM @EventsTBL AS e
JOIN (
SELECT PID, COUNT(DISTINCT EID) AS EIDCount
FROM @EventsTBL
GROUP BY PID
) AS a ON e.PID = a.PID
), cte2 AS (
/*
* build a list of "points in time" at which an event started or ended
* and the number concurrent events changed
* the zero value rows are required!
*/
SELECT PID, EIDCount, StartDate AS pdate, 1 AS pval
FROM cte1
UNION ALL
SELECT PID, EIDCount, EndDate, 0
FROM cte1
UNION ALL
SELECT PID, EIDCount , DATEADD(DAY, 1, EndDate), -1
FROM cte1
), cte3 AS (
/*
* calculate running sum of pval over dates; minus ones first
*/
SELECT PID, EIDCount, pdate, SUM(pval) OVER (PARTITION BY PID ORDER BY pdate, pval) AS evtcount
FROM cte2
), cte4 AS (
/*
* consolidate data for same dates and we are done with the main part
*/
SELECT PID, EIDCount, pdate, MAX(evtcount) AS evtcount
FROM cte3
GROUP BY PID, EIDCount, pdate
), cte5 AS (
/*
* assign "change flag" to rows where number of concurrent events
* enters or exits the required count w.r.t. previous row
*/
SELECT PID, EIDCount, pdate, evtcount, CASE
WHEN evtcount < EIDCount AND LAG(evtcount) OVER (PARTITION BY PID ORDER BY pdate) < EIDCount THEN 0
WHEN evtcount = EIDCount AND LAG(evtcount) OVER (PARTITION BY PID ORDER BY pdate) = EIDCount THEN 0
ELSE 1
END AS chg
FROM cte4
), cte6 AS (
/*
* convert "change flag" to "group numbers" over consecutive rows using running sum
*/
SELECT PID, EIDCount, pdate, evtcount, SUM(chg) OVER (PARTITION BY PID ORDER BY pdate) AS grp
FROM cte5
)
/*
* group rows by pid and group numbers
*/
SELECT PID, MIN(pdate) AS StartDate, MAX(pdate) AS EndDate
FROM cte6
WHERE evtcount = EIDCount
GROUP BY PID, grp
ORDER BY PID, StartDate
Demo on db<>fiddle
Select data between a date/time range
You need to update the date format:
select * from hockey_stats
where game_date between '2012-03-11 00:00:00' and '2012-05-11 23:59:00'
order by game_date desc;
Checking date range against set of date ranges in a data table
Using an extension method to check if a Date is between two others,
public static class DateTimeExt {
public static bool Between(this DateTime aDate, DateTime start, DateTime end) => start <= aDate && aDate <= end;
}
You can write an Overlaps method that determines if two ranges overlap:
public static bool Overlaps(DateTime aPeriodStart, DateTime aPeriodEnd, DateTime bPeriodStart, DateTime bPeriodEnd)
=> aPeriodStart.Between(bPeriodStart, bPeriodEnd) ||
aPeriodEnd.Between(bPeriodStart, bPeriodEnd) ||
bPeriodStart.Between(aPeriodStart, aPeriodEnd);
Now with another extension method that converts an IEnumerable<DataRow>
to a DataTable
containing the rows:
public static class IEnumerableExt {
public static DataTable ToDataTable(this IEnumerable<DataRow> src) {
var ans = src.First().Table.Clone();
foreach (var r in src)
ans.ImportRow(r);
return ans;
}
}
Your final method is straight forward:
DataTable FilterTable(DataTable timeTable, DateTime startDate, DateTime endDate) =>
timeTable.AsEnumerable().Where(period => Overlaps(period.Field<DateTime>("StartDate"), period.Field<DateTime>("EndDate"), startDate, endDate)).ToDataTable();
NOTE: If you don't need the answering DataTable
for anything, it would be more efficient to replace .ToDataTable()
with .Any()
and just have the method return a bool
indicating if any overlaps exist.
Determine if the day of a month is in a date range, independent from its year
Update 2
dplyr/data.table independent function
md_in_interval <- function(md, start, end) {
# does the interval cover more than a full year?
# Then any date will fall in this interval and hence the result is TRUE
helper <- (lubridate::year(end) - lubridate::year(start)) > 1
# lubridate time interval
interval <- lubridate::interval(dates_start, dates_end)
# helper dates with month/day combination and start year
my_date1 <- lubridate::mdy(paste0(md, lubridate::year(start)))
# helper dates with month/day combination and end year
my_date2 <- lubridate::mdy(paste0(md, lubridate::year(end)))
# check if month/day combination falls within the interval
out <- my_date1 %within% interval |
my_date2 %within% interval |
helper
return(out)
}
Usage with data.table
library(data.table)
dt <- data.table::as.data.table(df)
dt[, isin := md_in_interval("06-05", dates_start, dates_end)][]
Update
To overcome the issue with when there are more than one year span we could use a helper column:
df %>%
mutate(across(, ymd),
helper = ifelse(year(dates_end) - year(dates_start) > 1, 1, 0),
interval = interval(dates_start, dates_end)) %>%
mutate(my_date1 = mdy(paste0("07-01-",year(dates_start))),
my_date2 = mdy(paste0("07-01-",year(dates_end)))) %>%
mutate(check = my_date1 %within% interval | my_date2 %within% interval | helper == 1) %>%
select(1,2,7)
dates_start dates_end check
1 2009-06-26 2010-01-19 TRUE
2 2010-01-20 2011-06-05 TRUE
3 2011-06-06 2011-06-20 FALSE
4 2011-06-21 2013-04-21 TRUE
5 2013-04-22 2016-02-17 TRUE
6 2016-02-18 2016-08-05 TRUE
7 2016-08-06 2018-05-11 TRUE
8 2018-05-12 2019-10-09 TRUE
9 2019-10-10 2021-10-25 TRUE
First answer:
We could use lubridate
for this.
We create an interval with
interval
then wewe check with %within% wether the day is in interval or not.
Before we have to create a month-day-year of 07-01 element. We do this with
mdy(paste0("07-01-",year(dates_start)))
library(dplyr)
library(lubridate)
df %>%
mutate(across(, ymd),
interval = interval(dates_start, dates_end)) %>%
mutate(my_date = mdy(paste0("07-01-",year(dates_start)))) %>%
mutate(check = my_date %within% interval)
dates_start dates_end interval my_date check
1 2009-06-26 2010-01-19 2009-06-26 UTC--2010-01-19 UTC 2009-07-01 TRUE
2 2010-01-20 2011-06-05 2010-01-20 UTC--2011-06-05 UTC 2010-07-01 TRUE
3 2011-06-06 2011-06-20 2011-06-06 UTC--2011-06-20 UTC 2011-07-01 FALSE
4 2011-06-21 2013-04-21 2011-06-21 UTC--2013-04-21 UTC 2011-07-01 TRUE
5 2013-04-22 2016-02-17 2013-04-22 UTC--2016-02-17 UTC 2013-07-01 TRUE
6 2016-02-18 2016-08-05 2016-02-18 UTC--2016-08-05 UTC 2016-07-01 TRUE
7 2016-08-06 2018-05-11 2016-08-06 UTC--2018-05-11 UTC 2016-07-01 FALSE
8 2018-05-12 2019-10-09 2018-05-12 UTC--2019-10-09 UTC 2018-07-01 TRUE
9 2019-10-10 2021-10-25 2019-10-10 UTC--2021-10-25 UTC 2019-07-01 FALSE
Add values between a date range
I have recreated what I believe to be your situation. I am using a record
for the Holdings.
public class SumDateRange {
record Holdings(long getHoldings) {
}
public static void main(String[] args) {
List<Holdings> listHoldingsDBRecords =
List.of(new Holdings(10), new Holdings(20),
new Holdings(30), new Holdings(140),
new Holdings(150), new Holdings(160));
List<LocalDate> filteredStartDate =
List.of(LocalDate.parse("2022-01-15"),
LocalDate.parse("2022-01-20"),
LocalDate.parse("2022-01-30"),
LocalDate.parse("2022-02-15"),
LocalDate.parse("2022-02-15"),
LocalDate.parse("2022-02-15"));
LocalDate filteredEndDate = LocalDate.parse("2022-01-31");
long sum = 0;
for (int i = 0; i < filteredStartDate.size(); i++) {
if (filteredStartDate.get(i).isBefore(filteredEndDate)) {
sum = sum
+ listHoldingsDBRecords.get(i).getHoldings();
}
}
System.out.println(sum);
}
}
prints
60
Remember that any date before the end date is valid. So perhaps you need a minimum acceptable date and check for within a range:
if (filteredStartDate.get(i).isAfter(minimumAcceptableDate) &&
filteredStartDate.get(i).isBefore(filteredEndDate)) {
But the above depends entirely on your ultimate goal and the data involved.
Related Topics
Use Soundex() Word by Word on SQL Server
How to Perform a Cross Join or Cartesian Product in Excel
Difference Between <> and != in Sql
Display All Data of All Tables
Oracle SQL Merge to Insert and Delete But Not Update
Composing Database.Esqueleto Queries, Conditional Joins and Counting
Adodataset Deleting from Joined Table
Update Multiple Records in Multiple Nested Tables in Oracle
Create Unique Index If Not Exists in Postgresql
Datareader.Getfieldtype Returned Null
Datetime Query on Only Year in SQL Server
How to Multiply a Single Row with a Number from Column in Sql
How to Convert Cyrillic Stored as Latin1 ( SQL ) to True Utf8 Cyrillic with Iconv
Database Restore Failing with Move