Difference Between Natural Join and Inner Join

Difference between natural join and inner join

One significant difference between INNER JOIN and NATURAL JOIN is the number of columns returned.

Consider:

TableA                           TableB
+------------+----------+ +--------------------+
|Column1 | Column2 | |Column1 | Column3 |
+-----------------------+ +--------------------+
| 1 | 2 | | 1 | 3 |
+------------+----------+ +---------+----------+

The INNER JOIN of TableA and TableB on Column1 will return

SELECT * FROM TableA AS a INNER JOIN TableB AS b USING (Column1);
SELECT * FROM TableA AS a INNER JOIN TableB AS b ON a.Column1 = b.Column1;
+------------+-----------+---------------------+    
| a.Column1 | a.Column2 | b.Column1| b.Column3|
+------------------------+---------------------+
| 1 | 2 | 1 | 3 |
+------------+-----------+----------+----------+

The NATURAL JOIN of TableA and TableB on Column1 will return:

SELECT * FROM TableA NATURAL JOIN TableB
+------------+----------+----------+
|Column1 | Column2 | Column3 |
+-----------------------+----------+
| 1 | 2 | 3 |
+------------+----------+----------+

The repeated column is avoided.

(AFAICT from the standard grammar, you can't specify the joining columns in a natural join; the join is strictly name-based. See also Wikipedia.)

(There's a cheat in the inner join output; the a. and b. parts would not be in the column names; you'd just have column1, column2, column1, column3 as the headings.)

What is the difference between Natural Join and Inner Join?

A Natural Join is a form of Inner Join where the join is implicitly across all columns of matching names on both sides of the join.

E.g.

Table A
abc int
def int
ghi varchar(20)

Table B
abc int
def int
jkl int

A natural join between tables A and B would be the same as an inner join on columns abc and def.


Inner joins that could not be replaced with a Natural Join:

TableA
inner join
TableB
on
TableA.Column1 = TableB.Column2 --Column names don't match

or

TableA
inner join
TableB
on
TableA.Column1 >= TableB.Column1 --Not equality

Inner Join vs Natural Join vs USING clause: are there any advantages?

Now, apart from the fact that the first form has a duplicated column, is there a real advantage to the other two forms? Or are they just syntactic sugar?

TL;DR NATURAL JOIN is used in a certain style of relational programming that is simpler than the usual SQL style. (Although when embedded in SQL it is burdened with the rest of SQL query syntax.) That's because 1. it directly uses the simple operators of predicate logic, the language of precision in engineering (including software engineering), science (including computer science) and mathematics, and moreover 2. simultaneously and alternatively it directly uses the simple operators of relational algebra.

The common complaint about NATURAL JOIN is that since shared columns aren't explicit, after a schema change inappropriate column pairing may occur. And that may be the case in a particular development environment. But in that case there was a requirement that only certain columns be joined and NATURAL JOIN without PROJECT was not appropriate. So these arguments assume that NATURAL JOIN is being used inappropriately. Moreover the arguers aren't even aware that they are ignoring requirements. Such complaints are specious. (Moreover, sound software engineering design principles lead to not having interfaces with such specificiatons.)

Another related misconceived specious complaint from the same camp is that "NATURAL JOIN does not even take foreign key relationships into account". But any join is there because of the table meanings, not the constraints. Constraints are not needed to query. If a constraint is added then a query remains correct. If a constraint is dropped then a query relying on it becomes wrong and must be changed to a phrasing that doesn't rely on it that wouldn't have had to change. This has nothing to do with NATURAL JOIN.


You have described the difference in effect: just one copy of each common column is returned.

From Is there any rule of thumb to construct SQL query from a human-readable description?:

It turns out that natural language expressions and logical expressions and relational algebra expressions and SQL expressions (a hybrid of the last two) correspond in a rather direct way.

Eg from Codd 1970:

The relation depicted is called component. [...] The meaning of component(x, y,z) is that part x is an immediate component (or subassembly) of part y, and z units of part x are needed to assemble one unit of part y.

From this answer:

Every base table has a statement template, aka predicate, parameterized by column names, by which we put a row in or leave it out.

Plugging a row into a predicate gives a statement aka proposition. The rows that make a true proposition go in a table and the rows that make a false proposition stay out. (So a table states the proposition of each present row and states NOT the proposition of each absent row.)

But every table expression value has a predicate per its expression. The relational model is designed so that if tables T and U hold rows where T(...) and U(...) (respectively) then:

  • T NATURAL JOIN U holds rows where T(...) AND U(...)
  • T WHEREcondition holds rows where T(...) AND condition
  • T UNION CORRESPONDING U holds rows where T(...) OR U(...)
  • T EXCEPT CORRESPONDING U holds rows where T(...) AND NOT U(...)
  • SELECT DISTINCTcolumns to keepFROM T holds rows where

    THERE EXISTS columns to drop SUCH THAT T(...)
  • etc

Whereas reasoning about SQL otherwise is... not "natural":

An SQL SELECT statement can be thought of algebraically as 1. implicitly RENAMEing each column C of a table with (possibly implicit) correlation name T to T.C, then 2. CROSS JOINing, then 3. RESTRICTing per INNER ON, then 4. RESTRICTing per WHERE, then 5. PROJECTing per SELECT, then 6. RENAMEing per SELECT, dropping T.s, then 7. implicitly RENAMEing to drop remaining T.s Between the T.-RENAMEings algebra operators can also be thought of as logic operators and table names as their predicates: T JOIN ... vs Employee T.EMPLOYEE has name T.NAME ... AND .... But conceptually inside a SELECT statement is a double-RENAME-inducing CROSS JOIN table with T.Cs for column names while outside tables have Cs for column names.

Alternatively an SQL SELECT statement can be thought of logically as 1. introducing FORSOME T IN E around the entire statement per correlation name T and base name or subquery E, then 2. referring to the value of quantified T by using T.C to refer to its C part, then 3. building result rows from T.Cs per FROM etc, then 4. naming the result row columns per the SELECT clause, then 4. leaving the scope of the FORSOMEs. Again the algebra operators are being thought of as logic operators and table names as their predicates. Again though, this conceptually has T.C inside SELECTs but C outside with correlation names coming and going.

These two SQL interpretations are nowhere near as straightforward as just using JOIN or AND, etc, interchangeably. (You don't have to agree that it's simpler, but that perception is why NATURAL JOIN and UNION/EXCEPT CORRESPONDING are there.) (Arguments criticizing this style outside the context of its intended use are specious.)

USING is a kind of middle ground orphan with one foot in the NATURAL JOIN camp and one in the CROSS JOIN. It has no real role in the former because there are no duplicate column names there. In the latter it is more or less just abbreviating JOIN conditions and SELECT clauses.

I can see the disadvantage in the latter forms is that you are expected to have named your primary and foreign keys the same, which is not always practical.

PKs (primary keys), FKs (foreign keys) & other constraints are not needed for querying. (Knowing a column is a function of others allows scalar subqueries, but you can always phrase without.) Moreover any two tables can be meaningfully joined. If you need two columns to have the same name with NATURAL JOIN you rename via SELECT AS.

What is the difference between Join and Natural Join?

Don't use natural join. It is a bug waiting to happen.

An explicit join has an on clause that lists the conditions for matching between the tables. In your example, department_id is for this purpose (although other columns might be available).

The using clause is another very useful alternative. You use it as:

FROM employees e JOIN
job_history j
USING (department_id)

It finds department_id in both tables and uses that for the join condition.

NATURAL JOIN adds a JOIN conditions for all columns in the tables that are the same. In your case, this would be department_id plus other columns.

The problem -- as you are experiencing -- is that you don't know what columns are used for the join. Even worse, explicit foreign key references are not used.

Because your query does not specify what is happening, there is lots of scope for mistakes and bugs. There is no actual need for NATURAL JOIN, so you might as well just learn to use ON and USING.

SQL Inner Join and Natural Join

Select * From r INNER JOIN s ON r.F=s.F

result will be

E  F  F  G
10 5 5 30
20 15 15 20

The INNER JOIN keyword selects records that have matching values in both tables. This is why 12 is removed ie, no common column.
Now the result for

Select E, G From r NATURAL JOIN s

is

E  G
10 30
20 20

when you use * on select you get

F  E   G
5 10 30
15 20 20

while using * on natural join it will be expanded to the following list of columns

  • All the common columns
  • Every column in the first (left) table that is not a common column
  • Every column in the second (right) table that is not a common column

Inner Join, Natural Joins and Equi Join

Inner join of A and B combines columns of a row from A and a row from B based on a join predicate. For example, a "sempai" join: SELECT ... FROM people A INNER JOIN people B ON A.age > B.age will pair each person with each person that is their junior; the juniormost people will not be selected from A, and seniormost people will not be selected from B, because there are no matching rows.

Equi join is a particular join where the join relation is equality. A "sempai" join from the last paragraph is not an equi join; but "same age" join would be. Though typically it would be used for foreign relationships (equi joins on primary keys), such as SELECT ... FROM person A INNER JOIN bicycle B ON A.bicycle_id = B.id. (Pay no attention to the fact that this is not a proper model, people sometimes have multiple bicycles... a bit of a silly example, I'm sure I could have found a better one.)

A natural join is a special kind of equi join that assumes equality of all shared columns (without explicitly stating the predicate). So for example SELECT ... FROM people A INNER JOIN bicycles B ON A.bicycle_id = B.bicycle_id is equivalent to SELECT ... FROM people A NATURAL JOIN bicycles B, assuming bicycle_id is the only column present in both tables. Most people I know will not use this, because of several reasons - it is a more common practice to have the primary key not repeat the table name, i.e. bicycles.id than bicycles.bicycles_id; it is possible the foreign key does not reflect the table name (e.g. person.overseer_id rather than person.person_id, for obvious reasons), and (forgotten my me but thankfully remembered by Sudipta Mondal) there might be unrelated columns that are named the same but make zero sense to join on, like creation_time. For these reasons, I have never used NATURAL JOIN in my life.

Equi/natural joins do not necessarily have to be inner.

Inner Join vs. Natural Join, speed-wise?

It is going to depend on the DBMS, and to some extent on the tables joined, and the difference is generally not going to be measurable. With an INNER JOIN, there will be two copies of the common columns; with a NATURAL JOIN, there will be just one copy of the common columns.

If the intermediate results have to be written to disk or sorted, there will be more data to be managed, and there is slightly more data to be returned to the client over the client-server connection, so an INNER JOIN might take slightly longer than an NATURAL JOIN. If the common columns are big enough - long characters fields, for example - this size difference could be significant.

The difference in time to prepare the two types of join is completely negligible. Both have to read the information for all the columns in both tables involved in the join, and the processing of that data is substantially similar.

During execution, the copying of the data to form the intermediate results might be slightly more complex for a NATURAL JOIN - because you don't simply take a copy of each row - but you don't necessarily take a full copy of each row for an INNER JOIN either, so the difference is again negligible. And, as before, there is slightly less data to be copied in aggregate for the NATURAL JOIN, so there is a still a small advantage to it.

So, overall, a NATURAL JOIN might be slightly quicker than an INNER JOIN, but the difference is still likely to be negligible.

difference between 'natural full outer join' and 'full outer join'

Natural join is basically an abomination. Don't use it.

The problem is that natural join uses the names of columns in the tables to define the join relationship. This makes it hard to understand queries, because you don't know what the relationships are. Worse, natural join doesn't even use declared foreign key relationships.

Use using or on instead.

As for the difference between natural full outer join and full outer join. They both are full outer join. In the latter, you explicitly define the keys for the join condition. In the former, the database engine chooses the keys based on common names between the tables.



Related Topics



Leave a reply



Submit