Performance of Inner Join VS Cartesian Product

CROSS JOIN vs INNER JOIN in SQL

Cross join does not combine the rows, if you have 100 rows in each table with 1 to 1 match, you get 10.000 results, Innerjoin will only return 100 rows in the same situation.

These 2 examples will return the same result:

Cross join

select * from table1 cross join table2 where table1.id = table2.fk_id

Inner join

select * from table1 join table2 on table1.id = table2.fk_id

Use the last method

Performance of inner join compared to cross join

Cross Joins produce results that consist of every combination of rows from two or more tables. That means if table A has 6 rows and table B has 3 rows, a cross join will result in 18 rows. There is no relationship established between the two tables – you literally just produce every possible combination.

With an inner join, column values from one row of a table are combined with column values from another row of another (or the same) table to form a single row of data.

If a WHERE clause is added to a cross join, it behaves as an inner join as the WHERE imposes a limiting factor.

As long as your queries abide by common sense and vendor specific performance guidelines (i), I like to think of the decision on which type of join to use to be a simple matter of taste.

(i) Vendor Specific Performance Guidelines

  1. MySQL Performance Tuning and Optimization Resources
  2. PostgreSQL Performance Optimization

SQL Query efficiency (JOIN or Cartesian Product )

Case 2 will give you a different output anyway, as you are not joining TableA and TableB in any way so you get a Cartesian product.

Since all of a sudden email came up, you will need a join in case 1:

In Case 1 you can simply rewrite the query to

SELECT DISTINCT A.Email , B.TestField 
FROM TableA A join TableB B on A.username = B.Username
WHERE B.username = 'ABC'

Which is more readable and easier to maintain as you do not ave a superfluous GROUP BY clause.

In Case 3 you have userId in your where clause, which is not even in your tableB according to your post.

In general, for maintainability and readibility:

Use explicit joins

SELECT * FROM A JOIN B ON A.id = B.id

is preferable over

SELECT * FROM A, B WHERE A.id = B.id

And use DISTINCT when you want distinct values, instead of GROUP BY over all columns:

SELECT DISTINCT a, b, b FROM TABLE

is preferable over

SELECT a, b, c FROM TABLE GROUP BY a, b, c

Oracle cartesian product vs. join

What is the database doing in a situation like this?

The same as when you specify an ANSI join:

SELECT *
FROM orders o
JOIN products p ON o.productid = p.id

I've noticed everybody writes their queries like this [...]

It looks like lots of people in your company got many years of experience in Oracle! I bet they also use (+) notation for outer joins. This was the only syntax supported by Oracle prior to the 9i release.

SQL Server: What is the difference between CROSS JOIN and FULL OUTER JOIN?

A CROSS JOIN produces a cartesian product between the two tables, returning all possible combinations of all rows. It has no ON clause because you're just joining everything to everything.

A FULL OUTER JOIN is a combination of a LEFT OUTER and RIGHT OUTER join. It returns all rows in both tables that match the query's WHERE clause, and in cases where the ON condition can't be satisfied for those rows it puts NULL values in for the unpopulated fields.

This wikipedia article explains the various types of joins with examples of output given a sample set of tables.

Does `join` generate Cartesian product every time when using mysql?

A Cartesian product, aka, CROSS JOIN, happens whenever you have a JOIN, including LEFT, RIGHT, and INNER, without either an ON clause or the equivalent in the WHERE clause.

A temp table may or may not be generated by any kind of JOIN. And it may or may not hit the disk.

JOINs in production. Sure. A well-indexed (etc) query is plenty fast. And, when you need a JOIN, the alternatives, if any, may be worse.

From a theoretical point of view, a JOIN is performed thus:

  1. Create the Cartesian product.
  2. Toss any rows that don't match the ON and WHERE restrictions.
  3. Move on to GROUP BY, HAVING, ORDER BY, and LIMIT.

In reality, the Optimizer takes all the short cuts it can think of. A typical JOIN goes more like this:

  1. Scan through one table, filtering out any rows not matching the WHERE.
  2. Reach into the joined table using an index to find the 0 or 1 or several rows there. Name: NLJ - Nested Loop Join
  3. Toss any more rows that don't match the rest of the ON and WHERE restrictions.
  4. etc.

As for LEFT and RIGHT -- Don't use them unless you need to get a result row even if the row is missing from the 'right' or 'left', respectively, table. It confuses the user and makes the Optimizer work harder to decide that you really meant INNER JOIN.

In MySQL, the keywords INNER and OUTER are essentially ignored. The existence of a suitable ON or WHERE controls what type of JOIN it is.

SQL Cross Join better in performance than normal join?

The two queries aren't equivalent, because:

SELECT lastname, date
FROM customer, transact
WHERE quantity > 1000

Doesn't actually limit to customers that bought > 1000, it's simply taking every combination of rows from those two tables, and excluding any with quantity less than or equal to 1000 (all customers will be returned).

This query is equivalent to your JOIN version:

SELECT lastname, date
FROM customer c, transact t
WHERE quantity > 1000
AND c.customerid = t.customerid

The explicit JOIN version is preferred as it's not deprecated syntax, but both should have the same execution plan and identical performance. The explicit JOIN version is easier to read in my opinion, but the fact that the comma listed/implicit method has been outdated for over a decade (two?) should be enough reason to avoid it.



Related Topics



Leave a reply



Submit