Proving SQL query equivalency
The best you can do is compare the 2 query outputs based on a given set of inputs looking for any differences. To say that they will always return the same results for all inputs really depends on the data.
For Oracle one of the better if not best approaches (very efficient) is here (Ctrl+F Comparing the Contents of Two Tables):
http://www.oracle.com/technetwork/issue-archive/2005/05-jan/o15asktom-084959.html
Which boils down to:
select c1,c2,c3,
count(src1) CNT1,
count(src2) CNT2
from (select a.*,
1 src1,
to_number(null) src2
from a
union all
select b.*,
to_number(null) src1,
2 src2
from b
)
group by c1,c2,c3
having count(src1) <> count(src2);
Check if two selects are equivalent
If you want to compare the query results try the following:
(select * from query1 MINUS select * from query2)
UNION ALL
(select * from query2 MINUS select * from query1)
This will result in all rows that are returned by only one of the queries.
Query equivalence evaluation
You probably can't prove it, since the problem seems to be NP-complete; check this SO question on query equivalence (that one is about Oracle, but there are a couple of answers / links that should be relevant for you).
Query with join equivalency?
No, they are not equivalent. Example:
CREATE TABLE a
( keya int ) ;
CREATE TABLE b
( keyb int ) ;
CREATE TABLE c
( keyc int ) ;
INSERT INTO a
VALUES
(1) ;
INSERT INTO b
VALUES
(1),(2) ;
INSERT INTO c
VALUES
(2) ;
Results:
SELECT *
FROM a
LEFT JOIN b
ON a.keya = b.keyb
INNER JOIN c
ON c.keyc = b.keyb ;
Result
----------------------
| keya | keyb | keyc |
----------------------
SELECT *
FROM a
LEFT JOIN
( SELECT b.*, c.*
FROM b
INNER JOIN c
ON c.keyc = b.keyb
) sub
ON a.keya = sub.keyb ;
Result
----------------------
| keya | keyb | keyc |
----------------------
| 1 | NULL | NULL |
----------------------
As to why this happens, a LEFT JOIN b INNER JOIN c
is parsed as (a LEFT JOIN b) INNER JOIN c
which is equivalent to (a INNER JOIN b) INNER JOIN c
because the condition on the INNER
join cancels the LEFT
join.
You can also write the second query in this form - without subquery - which is parsed as a LEFT JOIN (b INNER JOIN c)
because of the different placing of the ON
clauses:
SELECT *
FROM a
LEFT JOIN
b
INNER JOIN c
ON c.keyc = b.keyb
ON a.keya = b.keyb ;
Result
----------------------
| keya | keyb | keyc |
----------------------
| 1 | NULL | NULL |
----------------------
Is Inner join Equivalence relation?
According to SQL standard they are equivalent. In fact, the database engine's query optimizer is free to rewrite it to a form that is faster to run.
In most database engines I have encountered, they are completely equivalent.
However, if you are talking about OUTER JOIN's, then of course the left/right ordering is significant, not just for performance reasons.
How to Prove that using subselect queries in SQL is killing performance of server
"Show, Don't Tell" - Examine and compare the query plans of the queries identified using SQL Profiler. Particularly look out for table scans and bookmark lookups (you want to see index seeks as often as possible). The 'goodness of fit' of query plans depends on up-to-date statistics, what indexes are defined, the holistic query workload.
Execution Plan Basics
Understanding More Complex Query Plans
Using SQL Server Profiler (2005 Version)
Run the queries in SQL Server Management Studio (SSMS) and turn on Query->Include Actual Execution Plan (CTRL+M)
Think yourself lucky they're only subselects (which in some cases the optimiser will produce equivalent 'join plans') and not correlated sub-queries!
Identify a query that is performing a high number of logical reads, re-write it using your preferred technique and then show how few logicals reads it does by comparison.
Here's a tip. To get the total number of logical reads performed, wrap a query in question with:
SET STATISTICS IO ON
GO
-- Run your query here
SET STATISTICS IO OFF
GO
Run your query, and switch to the messages tab in the results pane.
If you are interested in learning more, there is no better book than SQL Server 2008 Query Performance Tuning Distilled, which covers the essential techniques for monitoring, interpreting and fixing performance issues.
Are these SQL Statements equivalent?
As a final result, yes.
Regarding the execution: the query optimizer might end up creating the same query execution plan for both queries.
This will be the case if, according to its approximate statistics (approximate equi-depth histograms for instance - which are not all the time up-to-date, by the way), the optimizer will determine that the first join is more selective than the second one and, consequently, it will execute this one first.
Stmt1 allows you to specify the order of the joins and, considering that you know exactly what the tables contain, this might be a better solution.
SQL Query optimization - distributive law of natural join and difference
NaturalJoin (R,S-T) equivalence Difference(NaturalJoin(R,S),NaturalJoin(R,T))
The general way to deal with this is to replace operator calls by their definitions.
Here is an outline assuming certain equivalences between relation expressions and the tuples they hold. One actually needs to use the equivalences to justify that one's queries return the tuples that one has been asked to get, but this is not usually explained. (One instead learns via a lot of examples and handwaving.)
S
& T
have the same set of attributes.X
holds rows (...)
where X(...)
, ie (...) IN X
.NATURALJOIN(X,Y)
holds rows where X(...) AND Y(...)
.DIFFERENCE(X,Y)
holds rows where X(...) AND NOT Y(...)
.
The left holds rows where:
R(...) AND (S(...) AND NOT T(...))
R(...) AND S(...) AND NOT T(...)
The right holds rows where:
(R(...) AND S(...)) AND NOT (R(...) AND T(...))
(R(...) AND S(...)) AND (NOT R(...) OR NOT T(...))
((R(...) AND S(...)) AND NOT R(...)) OR ((R(...) AND S(...)) AND NOT T(...))
(R(...) AND S(...) AND NOT R(...)) OR (R(...) AND S(...) AND NOT T(...))
R(...) AND S(...) AND NOT T(...)
So they are equivalent.
You can convert this to a proof by replacing X(...)
by x IN X
and using appropriate quantifications (FORALL
& FORSOME
/EXISTS
) and set comprehensions ({
variable
|
wff
}
).
Re using natural join in reasoning & SQL see this answer and its links.
And if you know what query could be more optimal in a sense of runtime, that would be really helpful.
That depends on your DMBS and its query implementation/optimization. There is no "optimal" without an execution model, cost/benefit function and that function's input arguments. Moreover "optimal" is chaotic--a tiny change in relational & physical DDL, database contents & statistics, query DML, query & update patterns and DBMS implementation can give completely different tradeoffs.
Related Topics
Best Way to Get the Next Id Number Without "Identity"
Postgresql Generate_Series of Months
SQL Count Total Number of Rows Whilst Using Limit
SQL Server - Check to See If Cast Is Possible
Store Multiple Elements in JSON Files in Aws Athena
Select Newest Records That Have Distinct Name Column
SQL Join, Group by on Three Tables to Get Totals
Ora-01843 Not a Valid Month- Comparing Dates
SQL Distinct Keyword Bogs Down Performance
How to Gracefully Include Formatted SQL Strings in an R Script
Using MySQL in Clause as All Inclusive (And Instead of Or)
SQL Joining Three Tables, Join Precedence
Create Table Permission Denied in Database 'Master'
Is There a Postgres Command to List/Drop All Materialized Views
SQL Server Xp_Delete_File Not Deleting Files
SQL Server: How to Get All Child Records Given a Parent Id in a Self Referencing Table