SQL : BETWEEN vs = and =
They are identical: BETWEEN
is a shorthand for the longer syntax in the question that includes both values (EventDate >= '10/15/2009' and EventDate <= '10/19/2009'
).
Use an alternative longer syntax where BETWEEN
doesn't work because one or both of the values should not be included e.g.
Select EventId,EventName from EventMaster
where EventDate >= '10/15/2009' and EventDate < '10/19/2009'
(Note <
rather than <=
in second condition.)
BETWEEN clause versus = AND =
There is no performance difference between the two example queries because BETWEEN
is simply a shorthand way of expressing an inclusive range comparison. When Oracle parses the BETWEEN
condition it will automatically expand out into separate comparison clauses:
ex.
SELECT *
FROM table
WHERE column BETWEEN :lower_bound AND :upper_bound
...will automatically become:
SELECT *
FROM table
WHERE :lower_bound <= column
AND :upper_bound >= column
BETWEEN operator vs. = AND =: Is there a performance difference?
No benefit, just a syntax sugar.
By using the BETWEEN
version, you can avoid function reevaluation in some cases.
Compare performance difference of T-SQL Between and ' ' ' ' operator?
You can check this easily enough by checking the query plans in both situations. There is no difference of which I am aware. There is a logical difference though between BETWEEN and "<" and ">"... BETWEEN is inclusive. It's equivalent to "<=" and "=>".
Difference in SQL Between operator and = & = operator
Modern databases ship with very intelligent query execution optimisers. One of their main features is query transformation. Logically equivalent expressions can usually be transformed into each other. e.g. as Anthony suggested, the BETWEEN
operator can be rewritten by Oracle (and MySQL) as two AND
-connected comparisons, and vice versa, if BETWEEN
isn't just implemented as syntactic sugar.
So even if there would be a difference (in performance), you can be assured that Oracle will very likely choose the better option.
This means that you can freely choose your preference, e.g. because of readability.
Note: it is not always obvious, what's logically equivalent. Query transformation rules become more complex when it comes to transforming EXISTS
, IN
, NOT EXISTS
, NOT IN
... But in this case, they are. For more details read the specification (chapter 8.3 between predicate):
http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt
Is there a performance difference between BETWEEN and IN with MySQL or in SQL in general?
BETWEEN
should outperform IN
in this case (but do measure and check execution plans, too!), especially as n
grows and as statistics are still accurate. Let's assume:
m
is the size of your tablen
is the size of your range
Index can be used (n
is tiny compared to m
)
In theory,
BETWEEN
can be implemented with a single "range scan" (Oracle speak) on the primary key index, and then traverse at mostn
index leaf nodes. The complexity will beO(n + log m)
IN
is usually implemented as a series (loop) ofn
"range scans" on the primary key index. Withm
being the size of the table, the complexity will always beO(n * log m)
... which is always worse (neglibile for very small tablesm
or very small rangesn
)
Index cannot be used (n
is a significant portion of m
)
In any case, you'll get a full table scan and evaluate the predicate on each row:
BETWEEN
needs to evaluate two predicates: One for the lower and one for the upper bound. The complexity isO(m)
IN
needs to evaluate at mostn
predicates. The complexity isO(m * n)
... which is again always worse, or perhapsO(m)
if the database can optimise theIN
list to be a hashmap, rather than a list of predicates.
SQL why use 'between' instead of ' = and ='
Sometimes using BETWEEN
can save you from evaluating an operation more than once:
SELECT AVG(RAND(20091225) BETWEEN 0.2 AND 0.4)
FROM t_source;
---
0.1998
SELECT AVG(RAND(20091225) >= 0.2 AND RAND(20091225) <= 0.4)
FROM t_source;
---
0.3199
In the first query RAND()
is only called once, but in the second query RAND
is called twice, here BETWEEN
saves you a second function call to RAND
.
There is also something to be said for readability in SQL queries, queries can become massive, functions like BETWEEN
help improve this.
SQL: BETWEEN and IN (which is faster)
- If your ids are always consecutive you should use
BETWEEN
. - If your ids may or may not be consecutive then use
IN
.
Performance shouldn't really be the deciding factor here. Having said that, BETWEEN seems to be faster in all examples that I have tested. For example:
Without indexes, checking a table with a million rows where every row has x = 1:
SELECT COUNT(*) FROM table1 WHERE x IN (1, 2, 3, 4, 5, 6);
Time taken: 0.55s
SELECT COUNT(*) FROM table1 WHERE x BETWEEN 1 AND 6;
Time taken: 0.54s
Without indexes, checking a table with a million rows where x has unique values:
SELECT COUNT(*) FROM table1 WHERE x IN (1, 2, 3, 4, 5, 6);
Time taken: 0.65s
SELECT COUNT(*) FROM table1 WHERE x BETWEEN 1 AND 6;
Time taken: 0.36s
A more realistic example though is that the id column is unique and indexed. When you do this the performance of both queries becomes close to instant.
SELECT COUNT(*) FROM table2 WHERE x IN (1, 2, 3, 4, 5, 6);
Time taken: 0.00s
SELECT COUNT(*) FROM table2 WHERE x BETWEEN 1 AND 6;
Time taken: 0.00s
So I'd say concentrate on writing a clear SQL statement rather than worrying about minor differences in execution speed. And make sure that the table is correctly indexed because that will make the biggest difference.
Note: Tests were performed on SQL Server Express 2008 R2. Results may be different on other systems.
SQL between not inclusive
It is inclusive. You are comparing datetimes to dates. The second date is interpreted as midnight when the day starts.
One way to fix this is:
SELECT *
FROM Cases
WHERE cast(created_at as date) BETWEEN '2013-05-01' AND '2013-05-01'
Another way to fix it is with explicit binary comparisons
SELECT *
FROM Cases
WHERE created_at >= '2013-05-01' AND created_at < '2013-05-02'
Aaron Bertrand has a long blog entry on dates (here), where he discusses this and other date issues.
Related Topics
How to Select an Entire Row Which Has the Largest Id in the Table
How to Make a Recursive SQL Query
Performance of Inner Join Compared to Cross Join
Stored Procedure With Optional "Where" Parameters
How to Create a Step in My SQL Server Agent Job Which Will Run My Ssis Package
Is There a Group_Concat Function in Ms-Access
Concatenate Values Based on Id
Coalesce Alternative in Access SQL
How to Fetch the Row Count for All Tables in a SQL Server Database
How to Define a Named Constant in a Postgresql Query
Access-Sql: Inner Join With Multiple Tables
How to Set a Maximum Execution Time For a MySQL Query
Parameterized Queries with Rodbc
Using Stored Procedure in Classical Asp .. Execute and Get Results