Join four tables involving LEFT JOIN without duplicates
You have two LEFT JOINS
:
- The 1st left join can join to multiple rows from
solved
. Say, 'jane' and 'luke' solved the task. - The 2nd left join can only join to users named 'luke' ('luke' in the join condition!).
You still get both rows, 'jane' is just not shown, the join condition filters her out, but the LEFT JOIN
preserves the row in the result anyway and appends NULL values.
You can achieve what you are after by using parentheses and an [INNER] JOIN
instead of the LEFT JOIN
between solved
and users
. The manual:
Use parentheses if necessary to determine the order of nesting. In the
absence of parentheses,JOIN
s nest left-to-right.
SELECT c.name AS cat_name, t.name AS task_name, u.name AS user_name
FROM task t
JOIN category c ON cat.id = t.category_id
LEFT JOIN
(solved s JOIN users u ON u.id = s.user_id AND u.name = 'luke') ON s.task_id = t.id
ORDER BY 1, 2, 3;
Using table name
users
instead of the reserved word.user
Assuming that
users.name
is defined unique or you can have multiple users named 'luke'.If
(task.id, users.id)
insolved
is definedUNIQUE
orPRIMARY KEY
, you don't needDISTINCT
at all.
The resulting query is not only correct, but also faster.
SqlAlchemy version of the above query: (contributed by @van)
This assumes that Category
, Task
and User
are mapped classes, while solved
is instance of Table
(just an association table as shown in code example Many to Many):
user_name = 'luke'
q = (session.query(Category.name, Task.name, User.name)
.select_from(Task)
.join(Category)
.outerjoin(
join(solved, User,
(solved.c.user_id == User.id) & (User.name == user_name),
))
.order_by(Category.name, Task.name, User.name)
)
Prevent duplicate values in LEFT JOIN
I like to call this problem "cross join by proxy". Since there is no information (WHERE
or JOIN
condition) how the tables department
and contact
are supposed to match up, they are cross-joined via the proxy table person
- giving you the Cartesian product. Very similar to this one:
- Two SQL LEFT JOINS produce incorrect result
More explanation there.
Solution for your query:
SELECT p.id, p.person_name, d.department_name, c.phone_number
FROM person p
LEFT JOIN (
SELECT person_id, min(department_name) AS department_name
FROM department
GROUP BY person_id
) d ON d.person_id = p.id
LEFT JOIN (
SELECT person_id, min(phone_number) AS phone_number
FROM contact
GROUP BY person_id
) c ON c.person_id = p.id;
You did not define which department or phone number to pick, so I arbitrarily chose the minimum. You can have it any other way ...
Left Join without duplicate rows from left table
Try an OUTER APPLY
SELECT
C.Content_ID,
C.Content_Title,
C.Content_DatePublished,
M.Media_Id
FROM
tbl_Contents C
OUTER APPLY
(
SELECT TOP 1 *
FROM tbl_Media M
WHERE M.Content_Id = C.Content_Id
) m
ORDER BY
C.Content_DatePublished ASC
Alternatively, you could GROUP BY
the results
SELECT
C.Content_ID,
C.Content_Title,
C.Content_DatePublished,
M.Media_Id
FROM
tbl_Contents C
LEFT OUTER JOIN tbl_Media M ON M.Content_Id = C.Content_Id
GROUP BY
C.Content_ID,
C.Content_Title,
C.Content_DatePublished,
M.Media_Id
ORDER BY
C.Content_DatePublished ASC
The OUTER APPLY
selects a single row (or none) that matches each row from the left table.
The GROUP BY
performs the entire join, but then collapses the final result rows on the provided columns.
How to do a LEFT JOIN in MS Access without duplicates?
You can do it with GROUP BY
:
SELECT X.id, X.value, MIN(Y.value)
FROM X
LEFT JOIN Y ON X.id = Y.id
GROUP BY X.id, X.value
Note that it is not necessary to bring Y.id
into the mix, because it is either null
or equal to X.id
.
many to many join without duplicate rows
The reason you're getting "duplicate" rows is because one film has many actors and one film has many writers. There is no relationship at all between a film actor and a film writer
A simple query for education purposes:
SELECT *
FROM
films
INNER JOIN actors on films.id = actors.filmid
INNER JOIN writers on films.id = writers.filmid
This is simplified to make my point, it hides the complexity of your tables that break the many:many relationships down to many:one
Now. If a film has 10 actors and 11 writers, this query will produce 110 rows
Every actor will be associated with every writer
Why? Because there is absolutely no relationship between a film actor and a film writer. The two things have nothing at all to do with each other. We haven't written anything in a JOIN condition to assert that some actor attribute is equal to some writer attribute because we can't - there just isn't anything to relate the two of them together
The only thing the database can possibly do as a result, is produce a row set that has a row for every actor/writer combination. The actor data is repeated 11 times and the writer data is repeated 10 times
Actor1/Writer1
Actor1/Writer2
...
Actor2/Writer1
Actor2/Writer2
...
Actor10/Writer11
There's no way round this "problem"-it is caused by trying to put the two unrelated things together on a single query. The only "solution" is to not put them in the same query. If you're writing a website like IMDB and you have a webpage for a film with two tabs, one for actors, and one for writers, run two separate queries (film join actors) and (film join writers) to fill the data for each tab - you flat out can't* do it in one query.
*when I say "can't", I mean "really really shouldn't". The "desired results" grid you've posted associates actor 1796 with writer 319 for no good reason at all - everything on a row is supposed to be related and these two entities share no relation other than some arbitrary decision that they both are ranked first when their ids are ordered ascending. While there are ways that that could be used to associate them and eliminate the Cartesian product from your query, it's a terrible code smell and indicative of the fact that you're approaching some other problem (that we can't see) in entirely the wrong way (sorry)
Related Topics
Sql- Ignore Case While Searching for a String
Why Would Year Fail with a Conversion Error from a Date
Sql: Select Maximum Value for Each Unique Key
Crosstab with 2 (Or More) Row Names
How to Extract Year and Month from Date in Postgresql Without Using To_Char() Function
How to Do SQL Select Top N ... in As400
Query Grants for a Table in Postgres
Difference Between Fetch/For to Loop a Cursor in Pl/Sql
Percentage from Total Sum After Group by SQL Server
Splitting a Comma-Separated Field in Postgresql and Doing a Union All on All the Resulting Tables
Join Four Tables Involving Left Join Without Duplicates
Generate Insert Script for Selected Records