Prevent Duplicate Values in Left Join

Prevent duplicate values in LEFT JOIN

I like to call this problem "cross join by proxy". Since there is no information (WHERE or JOIN condition) how the tables department and contact are supposed to match up, they are cross-joined via the proxy table person - giving you the Cartesian product. Very similar to this one:

  • Two SQL LEFT JOINS produce incorrect result

More explanation there.

Solution for your query:

SELECT p.id, p.person_name, d.department_name, c.phone_number
FROM person p
LEFT JOIN (
SELECT person_id, min(department_name) AS department_name
FROM department
GROUP BY person_id
) d ON d.person_id = p.id
LEFT JOIN (
SELECT person_id, min(phone_number) AS phone_number
FROM contact
GROUP BY person_id
) c ON c.person_id = p.id;

You did not define which department or phone number to pick, so I arbitrarily chose the minimum. You can have it any other way ...

How does one use join in mysql and avoid duplicate entries in response

First, use distinct * is counterintuitive, you are essentially selecting every row in the table then eliminating duplicate rows. Try to avoid using that.

since you have tried distinct it eliminated the possibility where you start off with duplicate data in your tables.
looking at your screenshot I think the rows are not duplicate. They might be identical on certain columns but can't be completely identical. for example.

media:
id name
----------- ---------------
1 mediaA
2 mediaB
3 mediaC

media_creditsDATA:
media_id credit_id name
----------- ----------- ---------------
1 1 good credit
1 2 ok credit
2 3 bad credit
3 4 no credit

if you execute the following sql with distinct or not the result is the same:

SELECT *
FROM media
INNER JOIN media_creditsDATA ON media.id = media_creditsDATA.media_id

result:

id          name            media_id    credit_id  name
----------- --------------- ----------- ----------- ---------------
1 mediaA 1 1 good credit
1 mediaA 1 2 ok credit
2 mediaB 2 3 bad credit
3 mediaC 3 4 no credit

If you only look at the first three columns in the result table then sure there are duplicate records, but not if you look at all the columns. As you can see the media table has a one to many relationship to media_creditsDATA table. The result table has records that share the same subset of columns but there are no duplicate records.


so I think the problem in this case is not how you join is how you filter your result. such as is there a subset of credit records you are looking for in media_creditsDATA table? or maybe you don't care and you just record with highest credit_id for each media records.

SELECT * 
FROM media
INNER JOIN (
select media_id, max(credit_id) as highest_credit_id from media_creditsDATA
group by media_id )media_creditsDATA ON media.id = media_creditsDATA.media_id

you get:

id          name            media_id    highest_credit_id
----------- --------------- ----------- --------------
1 mediaA 1 2
2 mediaB 2 3
3 mediaC 3 4

INSERT INTO SELECT with a LEFT JOIN to prevent duplicates, only prevents duplicates already in the table

You are correct on the "snapshot" point: any insertions into table1 in this query will not affect the LEFT JOIN table1.

But you would still need a DISTINCT to guarantee uniqueness from the queried data.

INSERT INTO table1 
SELECT DISTINCT
t2.col1,
t2.col2
FROM table2 t2
LEFT JOIN table1 t1
ON t2.col1 = t1.col1
AND t2.col2 = t1.col2
WHERE t1.col1 IS NULL

However:

  • LEFT JOIN is a poor man's replacement for NOT EXISTS and EXCEPT which the optimizer understands much better
  • You should always specify column names in an INSERT

So your code should look like one of these options:

INSERT INTO table1 (col1, col2)
SELECT DISTINCT
t2.col1,
t2.col2
FROM table2 t2
WHERE NOT EXISTS (SELECT 1
FROM table1 t1
WHERE t2.col1 = t1.col1
AND t2.col2 = t1.col2);

INSERT INTO table1 (col1, col2)
SELECT DISTINCT
t2.col1,
t2.col2
FROM table2 t2
WHERE NOT EXISTS ( -- or you can use EXISTS/EXCEPT
SELECT t2.col1, t2.col2
INTERSECT
SELECT t1.col1, t1.col2
FROM table1 t1);

INSERT INTO table1 (col1, col2)
SELECT -- EXCEPT implies DISTINCT
t2.col1,
t2.col2
FROM table2 t2
EXCEPT
SELECT t1.col1, t1.col2
FROM table1 t1;

How to avoid duplicate records in left join

You need to aggregate t2 before joining:

SELECT t1.*, t2.City
FROM t1 LEFT JOIN
(SELECT t2.ID, ANY_VALUE(t2.City) as City
FROM t2
GROUP BY t2.ID)
) t2
ON t1.ID = t2.ID;


Related Topics



Leave a reply



Submit