CROSS JOIN vs INNER JOIN in SQL
Cross join does not combine the rows, if you have 100 rows in each table with 1 to 1 match, you get 10.000 results, Innerjoin will only return 100 rows in the same situation.
These 2 examples will return the same result:
Cross join
select * from table1 cross join table2 where table1.id = table2.fk_id
Inner join
select * from table1 join table2 on table1.id = table2.fk_id
Use the last method
Performance of inner join compared to cross join
Cross Joins produce results that consist of every combination of rows from two or more tables. That means if table A has 6 rows and table B has 3 rows, a cross join will result in 18 rows. There is no relationship established between the two tables – you literally just produce every possible combination.
With an inner join, column values from one row of a table are combined with column values from another row of another (or the same) table to form a single row of data.
If a WHERE clause is added to a cross join, it behaves as an inner join as the WHERE imposes a limiting factor.
As long as your queries abide by common sense and vendor specific performance guidelines (i), I like to think of the decision on which type of join to use to be a simple matter of taste.
(i) Vendor Specific Performance Guidelines
- MySQL Performance Tuning and Optimization Resources
- PostgreSQL Performance Optimization
SQL Query efficiency (JOIN or Cartesian Product )
Case 2 will give you a different output anyway, as you are not joining TableA and TableB in any way so you get a Cartesian product.
Since all of a sudden email
came up, you will need a join in case 1:
In Case 1 you can simply rewrite the query to
SELECT DISTINCT A.Email , B.TestField
FROM TableA A join TableB B on A.username = B.Username
WHERE B.username = 'ABC'
Which is more readable and easier to maintain as you do not ave a superfluous GROUP BY clause.
In Case 3 you have userId in your where
clause, which is not even in your tableB according to your post.
In general, for maintainability and readibility:
Use explicit joins
SELECT * FROM A JOIN B ON A.id = B.id
is preferable over
SELECT * FROM A, B WHERE A.id = B.id
And use DISTINCT when you want distinct values, instead of GROUP BY over all columns:
SELECT DISTINCT a, b, b FROM TABLE
is preferable over
SELECT a, b, c FROM TABLE GROUP BY a, b, c
Oracle cartesian product vs. join
What is the database doing in a situation like this?
The same as when you specify an ANSI join:
SELECT *
FROM orders o
JOIN products p ON o.productid = p.id
I've noticed everybody writes their queries like this [...]
It looks like lots of people in your company got many years of experience in Oracle! I bet they also use (+)
notation for outer joins. This was the only syntax supported by Oracle prior to the 9i release.
SQL Server: What is the difference between CROSS JOIN and FULL OUTER JOIN?
A CROSS JOIN
produces a cartesian product between the two tables, returning all possible combinations of all rows. It has no ON
clause because you're just joining everything to everything.
A FULL OUTER JOIN
is a combination of a LEFT OUTER
and RIGHT OUTER
join. It returns all rows in both tables that match the query's WHERE
clause, and in cases where the ON
condition can't be satisfied for those rows it puts NULL
values in for the unpopulated fields.
This wikipedia article explains the various types of joins with examples of output given a sample set of tables.
Does `join` generate Cartesian product every time when using mysql?
A Cartesian product, aka, CROSS JOIN
, happens whenever you have a JOIN
, including LEFT
, RIGHT
, and INNER
, without either an ON
clause or the equivalent in the WHERE
clause.
A temp table may or may not be generated by any kind of JOIN
. And it may or may not hit the disk.
JOINs
in production. Sure. A well-indexed (etc) query is plenty fast. And, when you need a JOIN
, the alternatives, if any, may be worse.
From a theoretical point of view, a JOIN
is performed thus:
- Create the Cartesian product.
- Toss any rows that don't match the
ON
andWHERE
restrictions. - Move on to
GROUP BY
,HAVING
,ORDER BY
, andLIMIT
.
In reality, the Optimizer takes all the short cuts it can think of. A typical JOIN
goes more like this:
- Scan through one table, filtering out any rows not matching the
WHERE
. - Reach into the joined table using an index to find the 0 or 1 or several rows there. Name: NLJ - Nested Loop Join
- Toss any more rows that don't match the rest of the
ON
andWHERE
restrictions. - etc.
As for LEFT
and RIGHT
-- Don't use them unless you need to get a result row even if the row is missing from the 'right' or 'left', respectively, table. It confuses the user and makes the Optimizer work harder to decide that you really meant INNER JOIN
.
In MySQL, the keywords INNER
and OUTER
are essentially ignored. The existence of a suitable ON
or WHERE
controls what type of JOIN
it is.
SQL Cross Join better in performance than normal join?
The two queries aren't equivalent, because:
SELECT lastname, date
FROM customer, transact
WHERE quantity > 1000
Doesn't actually limit to customers that bought > 1000, it's simply taking every combination of rows from those two tables, and excluding any with quantity less than or equal to 1000 (all customers will be returned).
This query is equivalent to your JOIN
version:
SELECT lastname, date
FROM customer c, transact t
WHERE quantity > 1000
AND c.customerid = t.customerid
The explicit JOIN
version is preferred as it's not deprecated syntax, but both should have the same execution plan and identical performance. The explicit JOIN
version is easier to read in my opinion, but the fact that the comma listed/implicit method has been outdated for over a decade (two?) should be enough reason to avoid it.
Related Topics
Search Count of Words Within a String Using SQL
Oracle/Sql: Wm_Concat & Order By
Returning the Value of Identity Column After Insertion in Oracle
Join Tables on Nearest Date in the Past, in MySQL
Select Single Row from Child Table for Each Row in Parent Table
Oracle Equivalent of Rowlock, Updlock, Readpast Query Hints
Creating a SQL Table from a Xls (Excel) File
How to Make Lag() Ignore Nulls in SQL Server
Cascade Delete in Many-To-Many Self-Reference Table
How to Make SQL Case Sensitive
Quickest Way to Clone Row in SQL
Query with Many Case Statements - Optimization
SQL Server 2019 Installation Windows 11 "Wait on the Database Engine Recovery Handle Failed" Error
Oracle How to Use Spool with Dynamic Spool Location
Replication from MySQL to Ms SQL
Why Can't I Exclude Dependent Columns from 'Group By' When I Aggregate by a Key