Sql Join: Selecting the Last Records in a One-To-Many Relationship

SQL join: selecting the last records in a one-to-many relationship

This is an example of the greatest-n-per-group problem that has appeared regularly on StackOverflow.

Here's how I usually recommend solving it:

SELECT c.*, p1.*
FROM customer c
JOIN purchase p1 ON (c.id = p1.customer_id)
LEFT OUTER JOIN purchase p2 ON (c.id = p2.customer_id AND
(p1.date < p2.date OR (p1.date = p2.date AND p1.id < p2.id)))
WHERE p2.id IS NULL;

Explanation: given a row p1, there should be no row p2 with the same customer and a later date (or in the case of ties, a later id). When we find that to be true, then p1 is the most recent purchase for that customer.

Regarding indexes, I'd create a compound index in purchase over the columns (customer_id, date, id). That may allow the outer join to be done using a covering index. Be sure to test on your platform, because optimization is implementation-dependent. Use the features of your RDBMS to analyze the optimization plan. E.g. EXPLAIN on MySQL.


Some people use subqueries instead of the solution I show above, but I find my solution makes it easier to resolve ties.

LINQ join: selecting the last records in a one-to-many relationship

You can do in this way:

var maxDateSignsByEmployee = from sign in AppDbContext.Signings.Where(s => s.Time.Date == DateTime.Now.Date)
group sign by sign.EmployeeId into res
select new{
EmpId = res.Key,
MaxDate = resg.Max(x => x.Time)
};

var list = from e in AppDbContext.Employees
join s in AppDbContext.Signings.Where(s => maxDateSignsByEmployee.Where(m => m.EmpId == s.EmployeeId && m.MaxDate == s.Time).Count() > 0)
on e.WorkerId equals s.EmployeeId
where e.DepartmentId == DepartmentId
select new Aux2()
{
Id = e.EmployeeId,
Type = s.SignType,
};

It should be what you need. It's a basic response, you can optimize it.

SQL JOIN - Selecting only records that meet ALL criteria in a 1-to-many relationship

One method is aggregation:

select i.case_num
from insurance i
group by i.case_num
having max(date_resolved) is null;

Or, you can use not exists:

select c.*
from cases c
where not exists (select 1
from insurance i
where i.case_num = c.case_num and i.date_resolved is not null
);

Note that these are not equivalent. The second returns cases that have no rows in insurance.

Select the first record in a one to many relation using left join

It seems you don't want the "first" row, but just two different ones based on the language. So you need to joins to the translation table: one for the "title" and one for the "description"

select p.*, t.description as title, d.description
from product p
left join product_translations t
on t.product_id = p.id
and t.language = 'en'
and t.column = 'title'
left join product_translations d
on d.product_id = p.id
and d.language = 'en'
and d.column = 'description'

If you want to retrieve this with a "primary" and "fallback" language, you can do something like this:

with product_texts as (
select t.product_id, t.value as title, d.value as description, t.lang
from product_translations t
join product_translations d
on d.product_id = t.product_id
and d.lang = t.lang
and d."column" = 'description'
where t."column" = 'title'
and t.product_id = 3
and t.lang in ('de', 'en')
)
select t.*
from product_texts t
order by case
when t.lang = 'de' then 1
else 2
end
limit 1

As it is a bit complicated to always join against that, you can create a function that does this:

create function get_translations(p_product_id int, p_lang text, p_fallback_lang text default 'en')
returns table (product_id int, title text, description text, lang text)
as
$$
with product_texts as (
select t.product_id, t.value as title, d.value as description, t.lang
from product_translations t
join product_translations d
on d.product_id = t.product_id
and d.lang = t.lang
and d."column" = 'description'
where t."column" = 'title'
and t.product_id = 3
and t.lang in (p_lang, p_fallback_lang)
)
select t.*
from product_texts t
order by case
when t.lang = p_lang then 1
else 2
end
limit 1
$$
language sql
rows 1
stable;

Then you can do:

select *
from products p
left join lateral get_translations(p.id, 'de', 'en') on true

There is next to none overhead in calling the function as this will be inlined - just as if you had written the function's query into the main query.

For good performance you want an index on (product_id, lang, "column") or maybe even two filtered indexes:

create index product_lang_title 
on product_translations (product_id, lang)
where "column" = 'title';

create index product_lang_descr
on product_translations (product_id, lang)
where "column" = 'description';

Online example

SQL Left join: selecting the last records in a one-to-many relationship

You got it almost right.

Your first query removes all customers that don't have details with the specified product, because you didn't specifiy the product filter in the ON condition of the first OUTER JOIN.

SELECT
cust.Customer
, cust.Company
, inv.Date
, inv.Product
, inv.Units
, inv.Extended
FROM
customerlist cust
LEFT OUTER JOIN
detail inv
ON
cust.customer = inv.customer
AND inv.Product IN ('CC', 'CG', 'CH')
LEFT OUTER JOIN
detail inv2
ON
inv.customer = inv2.customer
AND (
inv.date < inv2.date
OR inv.date = inv2.date AND inv.customer < inv2.customer
)
WHERE
inv2.customer IS NULL

That should do it.

There is one other thing I think is not quite correct. The AND inv.customer < inv2.customer part should probably be AND inv.id < inv2.id (if there is an id field in the detail table).

That's because the OR condition is filtering the detail records that have the same date by their primary key.

UPDATE

Since the table in question has no primary key field you can use the ROWID ADS feature to solve that:

SELECT
cust.Customer
, cust.Company
, inv.Date
, inv.Product
, inv.Units
, inv.Extended
FROM
customerlist cust
LEFT OUTER JOIN
detail inv
ON
cust.customer = inv.customer
AND inv.Product IN ('CC', 'CG', 'CH')
LEFT OUTER JOIN
detail inv2
ON
inv.customer = inv2.customer
AND (
inv.date < inv2.date
OR inv.date = inv2.date AND inv.ROWID < inv2.ROWID
)
WHERE
inv2.customer IS NULL

mysql join one to many last record

Use an additional group by to get the last record:

SELECT one.id1, one.id2, two.content
FROM table1 one JOIN
table2 two
ON one.id1 = two.id1 AND one.id2 = two.id2 JOIN
(SELECT t.id2, MAX(time) as maxt
FROM table2 t
GROUP BY t.id2
) t
ON two.id2 = t.id2 and two.time = maxt

get last record of one to many relation

Try this:

SELECT users.*, health.* FROM users  
INNER JOIN health
ON health.id = (SELECT id FROM health WHERE health.id = users.id ORDER BY id DESC LIMIT 1)

Select Last Record In One-Too-Many Relationships

The first step is to get the lasted added date from diagnistic per staff member:

SELECT  d.StaffID, MAX(Added) AS MaxAdded
FROM ct_pm_Diagnostic
GROUP BY d.StaffID;

You would then need to join the results of this query back to ct_pm_Diagnostic to filter it just for the latest record for each staff member:

SELECT  d.*
FROM ct_pm_Diagnostic AS d
INNER JOIN
( SELECT d.StaffID, MAX(Added) AS Added
FROM ct_pm_Diagnostic
GROUP BY d.StaffID
) AS MaxD
ON MaxD.StaffID = d.StaffID
AND MaxD.Added = d.Added;

I think you can then add this back to your original query as follows (Untested):

SELECT s.id, s.first_name, s.last_name, o.Outcome 
FROM ((SELECT s.id, s.first_name, s.last_name
FROM ((dbo_ds_staff AS s
INNER JOIN dbo_ds_team_staff_member AS tsm ON s.id=tsm.staff_id)
INNER JOIN dbo_ds_team_leader AS tl ON tsm.team_id=tl.team_id)
WHERE tl.staff_id = 26928 And tsm.start_date < Now() And (tsm.end_date >= Now() Or tsm.end_date Is Null) And tl.start_date < Now() And (tl.end_date >= Now() Or tl.end_date Is Null)
GROUP BY s.id, s.first_name, s.last_name) as s
LEFT JOIN (ct_pm_Diagnostic AS d
INNER JOIN (SELECT d.StaffID, MAX(Added) AS Added
FROM ct_pm_Diagnostic
GROUP BY d.StaffID) AS MaxD
ON MaxD.StaffID = d.StaffID AND MaxD.Added = d.Added)
ON s.id=d.StaffId)
LEFT JOIN ct_pm_Outcome AS o ON d.OutcomeId = o.OutcomeId
ORDER BY s.first_name, s.last_name

EDIT

I tried to give the least verbose join method possible, but it looks like Access doesn't support it, I can't see any reason why the below wouldn't work:

SELECT s.id, s.first_name, s.last_name, o.Outcome 
FROM ((SELECT s.id, s.first_name, s.last_name
FROM ((dbo_ds_staff AS s
INNER JOIN dbo_ds_team_staff_member AS tsm ON s.id=tsm.staff_id)
INNER JOIN dbo_ds_team_leader AS tl ON tsm.team_id=tl.team_id)
WHERE tl.staff_id = 26928 And tsm.start_date < Now() And (tsm.end_date >= Now() Or tsm.end_date Is Null) And tl.start_date < Now() And (tl.end_date >= Now() Or tl.end_date Is Null)
GROUP BY s.id, s.first_name, s.last_name) as s
LEFT JOIN (SELECT d.*
FROM ct_pm_Diagnostic AS d
INNER JOIN
( SELECT d.StaffID, MAX(Added) AS Added
FROM ct_pm_Diagnostic
GROUP BY d.StaffID
) AS MaxD
ON MaxD.StaffID = d.StaffID
AND MaxD.Added = d.Added) AS d
ON s.id=d.StaffId)
LEFT JOIN ct_pm_Outcome AS o ON d.OutcomeId = o.OutcomeId
ORDER BY s.first_name, s.last_name

If it still doesn't work you may be best off saving this query:

SELECT  d.*
FROM ct_pm_Diagnostic AS d
INNER JOIN
( SELECT d.StaffID, MAX(Added) AS Added
FROM ct_pm_Diagnostic
GROUP BY d.StaffID
) AS MaxD
ON MaxD.StaffID = d.StaffID
AND MaxD.Added = d.Added;

As something like LatestDiagnostic, then you can just replace ct_pm_Diagnostic with LatestDiagnostic in your original query.



Related Topics



Leave a reply



Submit