How Can a Left Outer Join Return More Records Than Exist in the Left Table

How can a LEFT OUTER JOIN return more records than exist in the left table?

The LEFT OUTER JOIN will return all records from the LEFT table joined with the RIGHT table where possible.

If there are matches though, it will still return all rows that match, therefore, one row in LEFT that matches two rows in RIGHT will return as two ROWS, just like an INNER JOIN.

EDIT:
In response to your edit, I've just had a further look at your query and it looks like you are only returning data from the LEFT table. Therefore, if you only want data from the LEFT table, and you only want one row returned for each row in the LEFT table, then you have no need to perform a JOIN at all and can just do a SELECT directly from the LEFT table.

How can A left outer join B return more rows than are in A?

This can happen when column b is not unique in table B. Suppose you have this data:


A B
+---+ +---+---+
| b | | b | c |
+---+ +---+---+
| 1 | | 2 | 1 |
| 2 | | 2 | 2 |
+---+ +---+---+

When you left-join from A to B on column b, you get


+-----+------+------+
| A.b | B.b | B.c |
+-----+------+------+
| 1 | NULL | NULL |
| 2 | 2 | 1 |
| 2 | 2 | 2 |
+-----+------+------+

which gives three rows in total, even though both A and B only have two rows each.

left join returning more than expected

LEFT JOIN can return multiple copies of the data from table1, if the foreign key for a row in table 1 is referenced by multiple rows in table2.

If you want it to only return 16 rows, one for each table 1 row, and with a random data set for table 2, you can use just a plain GROUP BY:

select *
from table1
left join table2 on table1.name = table2.name
group by table1.name

GROUP BY aggregates rows based on a field, so this will collapse all the table1 duplicates into one row. Generally, you specify aggregate functions to explain how the rows should collapse (for example, for a number row, you could collapse it using SUM() so the one row would be the total). If you just want one random row though, don't specify any aggregate functions. MySQL will by default just choose one row (note that this is specific to MySQL, most databases will require you to specify aggregates when you group). The way it chooses it is not technically "random", but it is not necessarily predictable to you. I guess by "random" you really just mean "any row will do".

Why Does MySql LEFT OUTER JOIN Return 20x more rows?

if you join your tables using only the date filed, if you have 5 records in the tableA with the date X and 20 records in the tableB with the same date X. the result of your query will be 5 x 20 = 100

the use of the date() function returns the date part of a date or datetime expression.

i'll try to explay using an example:

table_A
--------
nameA, date
a1, 2017-11-01
a2, 2017-11-01

table_B
-------
nameB, date
b1, 2017-11-01
b2, 2017-11-01

if you join A on B using a similar join used in your query:

select nameA,nameB from table_A left join table_B on Date(table_A) =
Date(table_B)

you will have:
a1, b1 -> Date(2017-11-01) is equal to Date(2017-11-01)
a1, b2 -> Date(2017-11-01) is equal to Date(2017-11-01)
a2, b1 -> Date(2017-11-01) is equal to Date(2017-11-01)
a2, b2 -> Date(2017-11-01) is equal to Date(2017-11-01)

Please keep in mind that using the Date() formula in your join, your database engine is forced to not use indexes. Then this is a really poor and slow way to query your data.

What is the maximum # of rows in a LEFT OUTER JOIN?

In an inner join, for each row from the "left" table there will be as many rows in the output as there are matching rows in the "right" table (matching on the join conditions, that is); this can be anything between 0 and j. So an inner join may return anywhere between 0 and i*j rows. Both are possible, by the way; just consider the join condition null is not null (to get 0 rows), or null is null (to get a cartesian join).

The only difference in an outer join (specifically, left outer join) is that for each row from the "left" table there will be at least one row in the output - even if there is no matching row in the right table. That's really what outer join means. So the only difference is that in a left outer join, the output will have between i and i*j rows, and again both are possible (with the same join conditions as above).

To your question about getting the max number of rows - for a somewhat more "natural" example, imagine both tables have a column purchase_date, and for some reason all rows in both tables have exactly the same (non-null) date in that column. Then if you join on left_table.purchase_date = right_table.purchase_date you will get a cartesian join, which has i*j rows.

LEFT OUTER JOIN and WHERE EXISTS. Are they equivalent?

The not exists and left join ... rgt.col is null approaches are identical. The left join however will contain columns from the unwanted table so just be specific with the select clause:

SELECT table_a.*, table_b.*, table_c.*
FROM table_a
JOIN table_b ...
JOIN table_c ...
LEFT JOIN table_d ...

I would rather avoid * at all and explicitly list exactly those columns that I need.

Union all inside the Left outer Join is not fetching all the values, it is just pulling values outside the sub-query (which PRJ_elements)


it's not fetching the columns wrapped inside the Join.

You didn't SELECT them in the outer query.

SQL operates on blocks of data. A table is a block of data that is fed into a FROM. The output of a query is also a block of data that can be fed into a FROM.

Person table has 3 columns; Name, Birthday, Height.

When you write:

SELECT name 
FROM person

You get just Name, even though the table has two other columns also

When you write:

SELECT name 
FROM (
SELECT name, height
FROM person
)x

You do not suddenly get Name and Height. You said you only wanted Name in the final executed SELECT (the topmost one). Just because you mentioned Height in the inner query does not mean it appears as an output of the outer query. The output of the inner query is fed into the outer query, and the out query doesn't select the Height column.. just like it didn't when the table name was used in

SELECT name    --doesn't mention height
FROM person --even though this has height

It's all just "input blocks of data" (in this case the Person table) and "output blocks of data" (in this case one output fed back to you)

In this query:

SELECT name            --also doesn't mention height
FROM (
SELECT name, height --even though this has height
FROM person
)x

We have "input data block of 3 columns from table person becomes 2 column wide output", that becomes "input data block of 2 columns becomes 1 column wide output" that is returned to you


In an outer query you can only reference the columns selected by an inner/sub query, and you reference them using the alias you gave to the sub query. The first query in any set of unions defines the column names for the entire union set

It's like as if the sub query is run and temporarily turned into a table for the duration of the query, thus:

SELECT sq.x
FROM (
SELECT name as x
FROM Person

UNION ALL

SELECT building_name
FROM address
) sq

Here you can see the sub query that does the union has been aliased as sq. The first query takes a person name and aliases it as x. The second query of the union pulls a building name out of the addresses table but this name has no effect on what the column will be called. The column is called x thanks to the first query in the union set

Thus you end up with a column you refer to as sq.x which is full of a mix of people and building names

In the outer query you cannot refer to any names of any columns or tables in the sub query; the sub query is run and the columns it selects become the columns in the block of data it is aliased as. Anything not selected is gone. If you need to use something you must select it:

SELECT sq.x
FROM (
SELECT name as x, age as y
FROM Person

UNION ALL

SELECT building_name, YEAR(GetUtcDate()) - building_built_year
FROM address
) sq
WHERE
sq.y > 50

This gets all people or buildings older than 50 years: the person query aliased age as y, the outer query used y. There is no alias given to the formula that caluctaes the building age; it goes in the y column thanks to the first query



Related Topics



Leave a reply



Submit