SQL Join on Nearest Less Than Date

SQL Join on Nearest less than date

I believe this subquery will do it (not tested).

select *, 
(select top 1 Discount
from table2
where table2.Date <= t.Date
order by table2.Date desc) as Discount
from Table1 t

Perhaps not the most performant however.

Edit:

Test code:

create table #table1 ([date] datetime, val int)
create table #table2 ([date] datetime, discount int)

insert into #table1 ([date], val) values ('1/26/2010', 10)
insert into #table1 ([date], val) values ('1/25/2010', 9)
insert into #table1 ([date], val) values ('1/24/2010', 8)
insert into #table1 ([date], val) values ('1/24/2010', 9)
insert into #table1 ([date], val) values ('1/23/2010', 7)
insert into #table1 ([date], val) values ('1/22/2010', 10)
insert into #table1 ([date], val) values ('1/21/2010', 11)

insert into #table2 ([date], discount) values ('1/26/2010', 2)
insert into #table2 ([date], discount) values ('1/23/2010', 1)
insert into #table2 ([date], discount) values ('1/20/2010', 0)

select *,
(select top 1 discount
from #table2
where #table2.[date] <= t.[date]
order by #table2.[date] desc) as discount
from #table1 t

drop table #table1
drop table #table2

Results:


2010-01-26 00:00:00.000 10 2
2010-01-25 00:00:00.000 9 1
2010-01-24 00:00:00.000 8 1
2010-01-24 00:00:00.000 9 1
2010-01-23 00:00:00.000 7 1
2010-01-22 00:00:00.000 10 0
2010-01-21 00:00:00.000 11 0

MySQL LEFT JOIN ON Closest Date

Seems like you want to join on closest date that's smaller. Have you tried:

Select Range.date, A, B
from Range inner join Metric
on Metric.date = (select max(date) from Metric where date < Range.date);

Join table by id and nearest date for every date

In standard SQL, you can get the date using a correlated subquery:

select t1.*,
(select t2.date_measured
from table2 t2
where t2.id = t1.id
order by abs(t2.date_measured - t1.date_measured) asc
fetch first 1 row only
) as t2_date_measured
from table1 t1;

You can then join back to table2 to get additional information from that row.

The above is generic SQL (not necessarily standard SQL). Date/time functions tend to be peculiar to each database; so - may not work for the difference. Not all databases support fetch first 1 row only, but almost all support some mechanism for doing the same thing.

Join two tables on the same date or closest date (before or after)

Now that we're considering both past and future [Lead_Date] values I've tweaked the test data cover a special case

Table: Test_PI

CustomerID  Product_Interest_Date  Product_Interest
---------- --------------------- ----------------
1 2014-09-07 Interest1
1 2014-09-08 Interest2
1 2014-09-15 Interest3
1 2014-09-28 Interest4

Table: Test_Leads

CustomerID  Lead_Date   Lead_Source
---------- ---------- -----------
1 2014-09-07 Source1
1 2014-09-14 Source2
2 2014-09-15 Source3
1 2014-09-16 Source4

We'll start by creating a saved Access query named [Date_Gaps]

SELECT
pi.CustomerID,
pi.Product_Interest_Date,
l.Lead_Date,
Abs(DateDiff("d", pi.Product_Interest_Date, l.Lead_Date)) AS Date_Gap
FROM
Test_PI pi
INNER JOIN
Test_Leads l
ON pi.CustomerID = l.CustomerID

returning

CustomerID  Product_Interest_Date  Lead_Date   Date_Gap
---------- --------------------- ---------- --------
1 2014-09-07 2014-09-07 0
1 2014-09-08 2014-09-07 1
1 2014-09-15 2014-09-07 8
1 2014-09-28 2014-09-07 21
1 2014-09-07 2014-09-14 7
1 2014-09-08 2014-09-14 6
1 2014-09-15 2014-09-14 1
1 2014-09-28 2014-09-14 14
1 2014-09-07 2014-09-16 9
1 2014-09-08 2014-09-16 8
1 2014-09-15 2014-09-16 1
1 2014-09-28 2014-09-16 12

Now the query

SELECT 
CustomerID,
Product_Interest_Date,
Min(Date_Gap) AS MinOfDate_Gap
FROM Date_Gaps
GROUP BY
CustomerID,
Product_Interest_Date

returns

CustomerID  Product_Interest_Date  MinOfDate_Gap
---------- --------------------- -------------
1 2014-09-07 0
1 2014-09-08 1
1 2014-09-15 1
1 2014-09-28 12

so if we simply join back into the [Date_Gaps] query to get the [Lead_Date]

SELECT
mingap.CustomerID,
mingap.Product_Interest_Date,
Date_Gaps.Lead_Date
FROM
Date_Gaps
INNER JOIN
(
SELECT
CustomerID,
Product_Interest_Date,
Min(Date_Gap) AS MinOfDate_Gap
FROM Date_Gaps
GROUP BY
CustomerID,
Product_Interest_Date
) mingap
ON Date_Gaps.CustomerID = mingap.CustomerID
AND Date_Gaps.Product_Interest_Date = mingap.Product_Interest_Date
AND Date_Gaps.Date_Gap = mingap.MinOfDate_Gap

we get

CustomerID  Product_Interest_Date  Lead_Date 
---------- --------------------- ----------
1 2014-09-07 2014-09-07
1 2014-09-08 2014-09-07
1 2014-09-15 2014-09-14
1 2014-09-15 2014-09-16
1 2014-09-28 2014-09-16

Notice that we get two hits for 09-15 because they both have a gap of 1 day (before and after). So, we need to break that tie by wrapping the above query in an aggregation query using Min(Lead_Date) (or Max(Lead_Date), your choice)

SELECT
CustomerID,
Product_Interest_Date,
Min(Lead_Date) AS MinOfLead_Date
FROM
(
SELECT
mingap.CustomerID,
mingap.Product_Interest_Date,
Date_Gaps.Lead_Date
FROM
Date_Gaps
INNER JOIN
(
SELECT
CustomerID,
Product_Interest_Date,
Min(Date_Gap) AS MinOfDate_Gap
FROM Date_Gaps
GROUP BY
CustomerID,
Product_Interest_Date
) mingap
ON Date_Gaps.CustomerID = mingap.CustomerID
AND Date_Gaps.Product_Interest_Date = mingap.Product_Interest_Date
AND Date_Gaps.Date_Gap = mingap.MinOfDate_Gap
)
GROUP BY
CustomerID,
Product_Interest_Date

to give us

CustomerID  Product_Interest_Date  MinOfLead_Date
---------- --------------------- --------------
1 2014-09-07 2014-09-07
1 2014-09-08 2014-09-07
1 2014-09-15 2014-09-14
1 2014-09-28 2014-09-16

So now we're ready to JOIN up the original tables

SELECT 
Test_PI.CustomerID,
Test_PI.Product_Interest_Date,
Test_PI.Product_Interest,
Test_Leads.Lead_Date,
Test_Leads.Lead_Source
FROM
(
Test_PI
INNER JOIN
(
SELECT
CustomerID,
Product_Interest_Date,
Min(Lead_Date) AS MinOfLead_Date
FROM
(
SELECT
mingap.CustomerID,
mingap.Product_Interest_Date,
Date_Gaps.Lead_Date
FROM
Date_Gaps
INNER JOIN
(
SELECT
CustomerID,
Product_Interest_Date,
Min(Date_Gap) AS MinOfDate_Gap
FROM Date_Gaps
GROUP BY
CustomerID,
Product_Interest_Date
) mingap
ON Date_Gaps.CustomerID = mingap.CustomerID
AND Date_Gaps.Product_Interest_Date = mingap.Product_Interest_Date
AND Date_Gaps.Date_Gap = mingap.MinOfDate_Gap
)
GROUP BY
CustomerID,
Product_Interest_Date
) closest
ON Test_PI.CustomerID = closest.CustomerID
AND Test_PI.Product_Interest_Date = closest.Product_Interest_Date
)
INNER JOIN
Test_Leads
ON Test_Leads.CustomerID = closest.CustomerID
AND Test_Leads.Lead_Date = closest.MinOfLead_Date

returning

CustomerID  Product_Interest_Date  Product_Interest  Lead_Date   Lead_Source
---------- --------------------- ---------------- ---------- -----------
1 2014-09-07 Interest1 2014-09-07 Source1
1 2014-09-08 Interest2 2014-09-07 Source1
1 2014-09-15 Interest3 2014-09-14 Source2
1 2014-09-28 Interest4 2014-09-16 Source4

Join based on ID and closest date

I think you can do what you want by using lead() to get the next phone date and then just joining:

select a.*, b.orddate, b.billdate
from (select a.*,
lead(phonedate) over (partition by customerid order by phonedate) as next_pd
from a
) a left join
b
on b.customerid = a.customerid and
b.orddate >= a.phonedate and
(b.orddate < a.next_pd or a.next_pd is null);

Select nearest date from another table SQL

Below is for BigQuery Standard SQL

#standardSQL
SELECT ANY_VALUE(e).*, MAX(SentDateTime) SentDateTime
FROM `project.dataset.table2` e
JOIN `project.dataset.table1` s
ON e.EmailName = s.EmailName
AND EventDateTime > SentDateTime
GROUP BY FORMAT('%t', e)

Spark SQL join on closest date by ID

I would replace the subquery with a limit with a window function :

df = df_a.join(df_b, on="id")

df = (
df.withColumn(
"rnk",
F.row_number().over(
W.partitionBy("id", "ingest_date").orderBy(
F.abs(F.datediff("ingest_date", "weekly_dt"))
)
),
)
.where("rnk=1")
.drop("rnk")
)

df.show()
+---+-----------+---+----------+-----+                                          
| id|ingest_date|amt| weekly_dt|score|
+---+-----------+---+----------+-----+
| A2| 2021-01-01| 31|2021-01-01| 0.6|
| A2| 2021-01-02| 48|2021-01-01| 0.6|
| A2| 2021-01-03| 47|2021-01-01| 0.6|
| A2| 2021-01-04| 9|2021-01-01| 0.6|
| A2| 2021-01-05| 16|2021-01-08| 0.1|
| A2| 2021-01-06| 44|2021-01-08| 0.1|
| A2| 2021-01-07| 45|2021-01-08| 0.1|
| A2| 2021-01-08| 21|2021-01-08| 0.1|
| A2| 2021-01-09| 36|2021-01-08| 0.1|
| A2| 2021-01-10| 9|2021-01-08| 0.1|
| A2| 2021-01-11| 32|2021-01-08| 0.1|
| A2| 2021-01-12| 10|2021-01-15| 0.9|
| A2| 2021-01-13| 47|2021-01-15| 0.9|
| A2| 2021-01-14| 42|2021-01-15| 0.9|
| A2| 2021-01-15| 1|2021-01-15| 0.9|
| A2| 2021-01-16| 22|2021-01-15| 0.9|
| A2| 2021-01-17| 27|2021-01-15| 0.9|
| A2| 2021-01-18| 49|2021-01-15| 0.9|
| A2| 2021-01-19| 18|2021-01-22| 0.3|
| A2| 2021-01-20| 28|2021-01-22| 0.3|
+---+-----------+---+----------+-----+
only showing top 20 rows


Related Topics



Leave a reply



Submit