How to Handle Duplicates Created by Left Join

Left Join query returns duplicates

You are missing the condition in your left join:

SELECT a.id,
       a.name,
       sa.dose
FROM   additives a
LEFT JOIN stage_additives sa
ON     a.id = sa.additive_id and sa.stage_id = 195;

Remember, the join is conceptually doing a cross join between the two tables and taking only the rows that match the on condition (the left join is also keeping all the rows in the first table). By not having an on condition, the join is keeping all pairs of rows from the two tables where sa.stage_id = 195 -- and that is a lot of pairs.

EDIT:

(In response to moving the condition sa.stage_id = 195 into a where clause.)

The condition sa.stage_id = 195 is in the on clause on purpose. This ensures that the left join actually behaves as written. If the condition were moved to a where clause, then the left join would turn into an inner join. Rows from additive with no match in stage_additive would have a NULL value for sa.stage_id and be filtered out. I have to assume that the OP intended for the left join to keep all rows in additive because of the explicit use of left join.

Duplicate data when using Snowflake SQL left join

so just reformed that SQL so is was less wide, and just to make sense of it.

select * from (
    select 1 as AccountIdA
) s
LEFT OUTER JOIN (
    SELECT * FROM VALUES (1,'val1'), (1,  'Val2') v(AccountIdB, col1)
) t1 on t1.AccountIdB = s.AccountIdA
LEFT OUTER JOIN (
    SELECT * FROM VALUES (1,'val3') v(AccountIdC, col2)
) t2 on t2.AccountIdC = s.AccountIdA;

But you get two rows, because of standard SQL logic. The joins happen is serial, thus you first have s with a single AccountIdA of 1.

AccountIdA
1

How to Handle Duplicates Created by Left Join

Left Join query returns duplicates

Duplicate data when using Snowflake SQL left join

Related Topics

Leave a reply