How to Join Two Tables But Only Return Rows That Don't Match

How can I join two tables but only return rows that match

You are selecting on Buyer and Property Ids. You'll be lucky if these line up, but your results still won't be correct. To join the tables where the house has as many rooms as the buyer would like, you need to join on the number of rooms:

SELECT DISTINCT Buyer.Buyer_Surname, Buyer.Rooms_Needed, 
Property.Property_Address as Property_for_sale, Property.Num_Rooms as
Property_No_of_Rooms
FROM Buyer
INNER JOIN Property
ON Buyer.Rooms_Needed = Property.Num_Rooms;

If you need to filter by number of rooms, you can also append a WHERE at the end:

... WHERE Buyer.Rooms_Needed = 5;

How can I join two tables but only return rows that don't match?

SELECT T1.*
FROM T1
WHERE NOT EXISTS(SELECT NULL
FROM T2
WHERE T1.ID = T2.ID
AND T1.Date = T2.Date
AND T1.Hour = T2.Hour)

It could also be done with a LEFT JOIN:

SELECT T1.*
FROM T1
LEFT JOIN T2
ON T1.ID = T2.ID
AND T1.Date = T2.Date
AND T1.Hour = T2.Hour
WHERE T2.ID IS NULL

How to select rows with no matching entry in another table?

Here's a simple query:

SELECT t1.ID
FROM Table1 t1
LEFT JOIN Table2 t2 ON t1.ID = t2.ID
WHERE t2.ID IS NULL

The key points are:

  1. LEFT JOIN is used; this will return ALL rows from Table1, regardless of whether or not there is a matching row in Table2.

  2. The WHERE t2.ID IS NULL clause; this will restrict the results returned to only those rows where the ID returned from Table2 is null - in other words there is NO record in Table2 for that particular ID from Table1. Table2.ID will be returned as NULL for all records from Table1 where the ID is not matched in Table2.

How to return rows from left table not found in right table?

If you are asking for T-SQL then lets look at fundamentals first. There are three types of joins here each with its own set of logical processing phases as:

  1. A cross join is simplest of all. It implements only one logical query processing phase, a Cartesian Product. This phase operates on the two tables provided as inputs to the join and produces a Cartesian product of the two. That is, each row from one input is matched with all rows from the other. So if you have m rows in one table and n rows in the other, you get m×n rows in the result.
  2. Then are Inner joins : They apply two logical query processing phases: A Cartesian product between the two input tables as in a cross join, and then it filters rows based on a predicate that you specify in ON clause (also known as Join condition).
  3. Next comes the third type of joins, Outer Joins:

    In an outer join, you mark a table as a preserved table by using the keywords LEFT OUTER JOIN, RIGHT OUTER JOIN, or FULL OUTER JOIN between the table names. The OUTER keyword is optional. The LEFT keyword means that the rows of the left table are preserved; the RIGHT keyword means that the rows in the right table are preserved; and the FULL keyword means that the rows in both the left and right tables are preserved.

    The third logical query processing phase of an outer join identifies the rows from the preserved table that did not find matches in the other table based on the ON predicate. This phase adds those rows to the result table produced by the first two phases of the join, and uses NULL marks as placeholders for the attributes from the nonpreserved side of the join in those outer rows.

Now if we look at the question: To return records from the left table which are not found in the right table use Left outer join and filter out the rows with NULL values for the attributes from the right side of the join.

SQL JOIN Query to return rows where we did NOT find a match in joined table

Your table design sounds fine.

If this query returns the id values of the "options" linked to a particular "product"...

SELECT k.option_id
FROM links k
WHERE k.product_id = 'foo'

Then this query would get the details of all the options related to the "product"

SELECT o.id
, o.name
FROM options o
JOIN links k
ON k.option_id = o.id
WHERE k.product_id = 'foo'

Note that we can actually move the "product_id='foo'" predicate from the WHERE clause to the ON clause of the JOIN, for an equivalent result, e.g.

SELECT o.id
, o.name
FROM options o
JOIN links k
ON k.option_id = o.id
AND k.product_id = 'foo'

(Not that it makes any difference here, but it would make a difference if we were using an OUTER JOIN (in the WHERE clause, it would negate the "outer-ness" of the join, and make it equivalent to an INNER JOIN.)

But, none of that answers your question, it only sets the stage for answering your question:

How do we get the rows from "options" that are NOT linked to particular product?

The most efficient approach is (usually) an anti-join pattern.

What that is, we will get all the rows from "options", along with any matching rows from "links" (for a particular product_id, in your case). That result set will include the rows from "options" that don't have a matching row in "links".

The "trick" is to filter out all the rows that had matching row(s) found in "links". That will leave us with only the rows that didn't have a match.

And way we filter those rows, we use a predicate in the WHERE clause that checks whether a match was found. We do that by checking a column that we know for certain will be NOT NULL if a matching row was found. And we know* for certain that column will be NULL if a matching row was NOT found.

Something like this:

SELECT o.id
, o.name
FROM options o
LEFT
JOIN links k
ON k.option_id = o.id
AND k.product_id = 'foo'
WHERE k.option_id IS NULL

The "LEFT" keyword specifies an "outer" join operation, we get all the rows from "options" (the table on the "left" side of the JOIN) even if a matching row is not found. (A normal inner join would filter out rows that didn't have a match.)

The "trick" is in the WHERE clause... if we found a matching row from links, we know that the "option_id" column returned from "links" would not be NULL. It can't be NULL if it "equals" something, and we know it had to "equals" something because of the predicate in the ON clause.

So, we know that the rows from options that didn't have a match will have a NULL value for that column.

It takes a bit to get your brain wrapped around it, but the anti-join quickly becomes a familiar pattern.


The "anti-join" pattern isn't the only way to get the result set. There are a couple of other approaches.

One option is to use a query with a "NOT EXISTS" predicate with a correlated subquery. This is somewhat easier to understand, but doesn't usually perform as well:

SELECT o.id
, o.name
FROM options o
WHERE NOT EXISTS ( SELECT 1
FROM links k
WHERE k.option_id = o.id
AND k.product_id = 'foo'
)

That says get me all rows from the options table. But for each row, run a query, and see if a matching row "exists" in the links table. (It doesn't matter what is returned in the select list, we're only testing whether it returns at least one row... I use a "1" in the select list to remind me I'm looking for "1 row".

This usually doesn't perform as well as the anti-join, but sometimes it does run faster, especially if other predicates in the WHERE clause of the outer query filter out nearly every row, and the subquery only has to run for a couple of rows. (That is, when we only have to check a few needles in a haystack. When we need to process the whole stack of hay, the anti-join pattern is usually faster.)

And the beginner query you're most likely to see is a NOT IN (subquery). I'm not even going to give an example of that. If you've got a list of literals, then by all means, use a NOT IN. But with a subquery, it's rarely the best performer, though it does seem to be the easiest to understand.

Oh, what the hay, I'll give a demo of that as well (not that I'm encouraging you to do it this way):

SELECT o.id
, o.name
FROM options o
WHERE o.id NOT IN ( SELECT k.option_id
FROM links k
WHERE k.product_id = 'foo'
AND k.option_id IS NOT NULL
GROUP BY k.option_id
)

That subquery (inside the parens) gets a list of all the option_id values associated with a product.

Now, for each row in options (in the outer query), we can check the id value to see if it's in that list returned by the subquery.

If we have a guarantee that option_id will never be NULL, we can omit the predicate that tests for "option_id IS NOT NULL". (In the more general case, when a NULL creeps into the resultset, then the outer query can't tell if o.id is in the list or not, and the query doesn't return any rows; so I usually include that, even when it's not required. The GROUP BY isn't strictly necessary either; especially if there's a unique constraint (guaranteed uniqueness) on the (product_id,option_id) tuple.

But, again, don't use that NOT IN (subquery), except for testing, unless there's some compelling reason to (for example, it manages to perform better than the anti-join.)

You're unlikely to notice any performance differences with small sets, the overhead of transmitting the statement, parsing it, generating an access plan, and returning results dwarfs the actual "execution" time of the plan. It's with larger sets that the differences in "execution" time become apparent.

EXPLAIN SELECT ... is a really good way to get a handle on the execution plans, to see what MySQL is really doing with your statement.

Appropriate indexes, especially covering indexes, can noticeably improve performance of some statements.

How can I join two tables but only return rows that don't match specific column which include NULL as well?

Try this answer, Hope this helps

CREATE TABLE #T1 (ID INT, oldID INT)
INSERT INTO #T1 VALUES(1,100)
INSERT INTO #T1 VALUES(2,NULL)
INSERT INTO #T1 VALUES(3,200)
INSERT INTO #T1 VALUES(4,500)

CREATE TABLE #T2 (ID INT, [NewId] INT)
INSERT INTO #T2 VALUES(1,NULL)
INSERT INTO #T2 VALUES(2,300)
INSERT INTO #T2 VALUES(3,200)
INSERT INTO #T2 VALUES(4,400)

select T1.ID,T1.oldID,T2.[NewId]
FROM #T1 T1, #T2 T2
WHERE T1.id=T2.id and ISNULL(T1.oldID,0) != ISNULL(T2.[NewId],0)

DROP TABLE #T1
DROP TABLE #T2

Find records from one table which don't exist in another

There's several different ways of doing this, with varying efficiency, depending on how good your query optimiser is, and the relative size of your two tables:

This is the shortest statement, and may be quickest if your phone book is very short:

SELECT  *
FROM Call
WHERE phone_number NOT IN (SELECT phone_number FROM Phone_book)

alternatively (thanks to Alterlife)

SELECT *
FROM Call
WHERE NOT EXISTS
(SELECT *
FROM Phone_book
WHERE Phone_book.phone_number = Call.phone_number)

or (thanks to WOPR)

SELECT * 
FROM Call
LEFT OUTER JOIN Phone_Book
ON (Call.phone_number = Phone_book.phone_number)
WHERE Phone_book.phone_number IS NULL

(ignoring that, as others have said, it's normally best to select just the columns you want, not '*')



Related Topics



Leave a reply



Submit