Join Four Tables Involving Left Join Without Duplicates

Join four tables involving LEFT JOIN without duplicates

You have two LEFT JOINS:

The 1st left join can join to multiple rows from solved. Say, 'jane' and 'luke' solved the task.
The 2nd left join can only join to users named 'luke' ('luke' in the join condition!).

You still get both rows, 'jane' is just not shown, the join condition filters her out, but the LEFT JOIN preserves the row in the result anyway and appends NULL values.

You can achieve what you are after by using parentheses and an [INNER] JOIN instead of the LEFT JOIN between solved and users. The manual:

Use parentheses if necessary to determine the order of nesting. In the
absence of parentheses, JOINs nest left-to-right.

SELECT c.name AS cat_name, t.name AS task_name, u.name AS user_name
FROM   task t
JOIN   category c ON cat.id = t.category_id
LEFT   JOIN
      (solved s JOIN users u ON u.id = s.user_id AND u.name = 'luke') ON s.task_id = t.id
ORDER  BY 1, 2, 3;

Using table name users instead of the reserved word ~~user~~.
Assuming that users.name is defined unique or you can have multiple users named 'luke'.
If (task.id, users.id) in solved is defined UNIQUE or PRIMARY KEY, you don't need DISTINCT at all.

The resulting query is not only correct, but also faster.

SqlAlchemy version of the above query: (contributed by @van)

This assumes that Category, Task and User are mapped classes, while solved is instance of Table (just an association table as shown in code example Many to Many):

user_name = 'luke'
q = (session.query(Category.name, Task.name, User.name)
     .select_from(Task)
     .join(Category)
     .outerjoin(
         join(solved, User,
              (solved.c.user_id == User.id) & (User.name == user_name),
         ))
     .order_by(Category.name, Task.name, User.name)
     )

Prevent duplicate values in LEFT JOIN

I like to call this problem "cross join by proxy". Since there is no information (WHERE or JOIN condition) how the tables department and contact are supposed to match up, they are cross-joined via the proxy table person - giving you the Cartesian product. Very similar to this one:

Two SQL LEFT JOINS produce incorrect result

More explanation there.

Solution for your query:

SELECT p.id, p.person_name, d.department_name, c.phone_number
FROM   person p
LEFT   JOIN (
   SELECT person_id, min(department_name) AS department_name
   FROM   department
   GROUP  BY person_id
   ) d ON d.person_id = p.id
LEFT   JOIN (
   SELECT person_id, min(phone_number) AS phone_number
   FROM   contact
   GROUP  BY person_id
   ) c ON c.person_id = p.id;

You did not define which department or phone number to pick, so I arbitrarily chose the minimum. You can have it any other way ...

Left Join without duplicate rows from left table

Try an OUTER APPLY

SELECT 
    C.Content_ID,
    C.Content_Title,
    C.Content_DatePublished,
    M.Media_Id
FROM 
    tbl_Contents C
    OUTER APPLY
    (
        SELECT TOP 1 *
        FROM tbl_Media M 
        WHERE M.Content_Id = C.Content_Id 
    ) m
ORDER BY 
    C.Content_DatePublished ASC

Alternatively, you could GROUP BY the results

SELECT 
    C.Content_ID,
    C.Content_Title,
    C.Content_DatePublished,
    M.Media_Id
FROM 
    tbl_Contents C
    LEFT OUTER JOIN tbl_Media M ON M.Content_Id = C.Content_Id 
GROUP BY
    C.Content_ID,
    C.Content_Title,
    C.Content_DatePublished,
    M.Media_Id
ORDER BY
    C.Content_DatePublished ASC

The OUTER APPLY selects a single row (or none) that matches each row from the left table.

The GROUP BY performs the entire join, but then collapses the final result rows on the provided columns.

How to do a LEFT JOIN in MS Access without duplicates?

You can do it with GROUP BY:

SELECT X.id, X.value, MIN(Y.value)
FROM X
LEFT JOIN Y ON X.id = Y.id
GROUP BY X.id, X.value

Note that it is not necessary to bring Y.id into the mix, because it is either null or equal to X.id.

many to many join without duplicate rows

The reason you're getting "duplicate" rows is because one film has many actors and one film has many writers. There is no relationship at all between a film actor and a film writer

A simple query for education purposes:

SELECT * 
FROM
  films 
  INNER JOIN actors on films.id = actors.filmid
  INNER JOIN writers on films.id = writers.filmid

This is simplified to make my point, it hides the complexity of your tables that break the many:many relationships down to many:one

Now. If a film has 10 actors and 11 writers, this query will produce 110 rows

Every actor will be associated with every writer

Why? Because there is absolutely no relationship between a film actor and a film writer. The two things have nothing at all to do with each other. We haven't written anything in a JOIN condition to assert that some actor attribute is equal to some writer attribute because we can't - there just isn't anything to relate the two of them together

The only thing the database can possibly do as a result, is produce a row set that has a row for every actor/writer combination. The actor data is repeated 11 times and the writer data is repeated 10 times

Actor1/Writer1
Actor1/Writer2
...
Actor2/Writer1
Actor2/Writer2
...
Actor10/Writer11

There's no way round this "problem"-it is caused by trying to put the two unrelated things together on a single query. The only "solution" is to not put them in the same query. If you're writing a website like IMDB and you have a webpage for a film with two tabs, one for actors, and one for writers, run two separate queries (film join actors) and (film join writers) to fill the data for each tab - you flat out can't* do it in one query.

*when I say "can't", I mean "really really shouldn't". The "desired results" grid you've posted associates actor 1796 with writer 319 for no good reason at all - everything on a row is supposed to be related and these two entities share no relation other than some arbitrary decision that they both are ranked first when their ids are ordered ascending. While there are ways that that could be used to associate them and eliminate the Cartesian product from your query, it's a terrible code smell and indicative of the fact that you're approaching some other problem (that we can't see) in entirely the wrong way (sorry)

Join Four Tables Involving Left Join Without Duplicates