how does SELECT TOP works when no order by is specified?
There is no guarantee which two rows you get. It will just be the first two retrieved from the table scan.
The TOP
iterator in the execution plan will stop requesting rows once two have been returned.
Likely for a scan of a heap this will be the first two rows in allocation order but this is not guaranteed. For example SQL Server might use the advanced scanning feature which means that your scan will read pages recently read from another concurrent scan.
When no 'Order by' is specified, what order does a query choose for your record set?
If you don't specify an ORDER BY
, then there is NO ORDER defined.
The results can be returned in an arbitrary order - and that might change over time, too.
There is no "natural order" or anything like that in a relational database (at least in all that I know of). The only way to get a reliable ordering is by explicitly specifying an ORDER BY
clause.
Update: for those who still don't believe me - here's two excellent blog posts that illustrate this point (with code samples!) :
- Conor Cunningham (Architect on the Core SQL Server Engine team): No Seatbelt - Expecting Order without ORDER BY
- Alexander Kuznetsov: Without ORDER BY, there is no default sort order (post in the Web Archive)
Unexpected SQL Behaviour with SELECT TOP Query
A TOP without ORDER BY is unpredictable. This is documented by Microsoft. From Microsoft docs:
When you use TOP with the ORDER BY clause, the result set is limited to the first N number of ordered rows. Otherwise, TOP returns the first N number of rows in an undefined order.
...
In a SELECT statement, always use an ORDER BY clause with the TOP clause. Because, it's the only way to predictably indicate which rows are affected by TOP.
See also how does SELECT TOP works when no order by is specified?
Why does SELECT TOP 1 . . . ORDER BY return the second row in the table?
In relational databases tables have no inherent order. The ORDER BY
you give is not distinct over all records, in fact it's the same over all records. So the order in which results are returned is still not deterministic and unpredictable. And therefor the top 1
returns an unpredictable row.
You say "Adam's details are first in the table", this is simply not true; records in a table are stored without any order. If you select without an order by
or (as in your case) the order by
is not deterministic the returned order is arbitrary.
The order of a SQL Select statement without Order By clause
No, that behavior cannot be relied on. The order is determined by the way the query planner has decided to build up the result set. simple queries like select * from foo_table
are likely to be returned in the order they are stored on disk, which may be in primary key order or the order they were created, or some other random order. more complex queries, such as select * from foo where bar < 10
may instead be returned in order of a different column, based on an index read, or by the table order, for a table scan. even more elaborate queries, with multipe where
conditions, group by
clauses, union
s, will be in whatever order the planner decides is most efficient to generate.
The order could even change between two identical queries just because of data that has changed between those queries. a "where" clause may be satisfied with an index scan in one query, but later inserts could make that condition less selective, and the planner could decide to perform a subsequent query using a table scan.
To put a finer point on it. RDBMS systems have the mandate to give you exactly what you asked for, as efficiently as possible. That efficiency can take many forms, including minimizing IO (both to disk as well as over the network to send data to you), minimizing CPU and keeping the size of its working set small (using methods that require minimal temporary storage).
without an ORDER BY
clause, you will have not asked exactly for a particular order, and so the RDBMS will give you those rows in some order that (maybe) corresponds with some coincidental aspect of the query, based on whichever algorithm the RDBMS expects to produce the data the fastest.
If you care about efficiency, but not order, skip the ORDER BY
clause. If you care about the order but not efficiency, use the ORDER BY
clause.
Since you actually care about BOTH use ORDER BY
and then carefully tune your query and database so that it is efficient.
Why is there no `select last` or `select bottom` in SQL Server like there is `select top`?
You can think of it like this.
SELECT TOP N
without ORDER BY
returns some N
rows, neither first, nor last, just some. Which rows it returns is not defined. You can run the same statement 10 times and get 10 different sets of rows each time.
So, if the server had a syntax SELECT LAST N
, then result of this statement without ORDER BY
would again be undefined, which is exactly what you get with existing SELECT TOP N
without ORDER BY
.
You have stressed in your question that you know and understand what I've written below, but I'll still keep it to make it clear for everyone reading this later.
Your first phrase in the question
In SQL-server we have
SELECT TOP N ...
now in that we can get the
first n rows in ascending order (by default), cool.
is not correct. With SELECT TOP N
without ORDER BY
you get N "random" rows. Well, not really random, the server doesn't jump randomly from row to row on purpose. It chooses some deterministic way to scan through the table, but there could be many different ways to scan the table and server is free to change the chosen path when it wants. This is what is meant by "undefined".
The server doesn't track the order in which rows were inserted into the table, so again your assumption that results of SELECT TOP N
without ORDER BY
are determined by the order in which rows were inserted in the table is not correct.
So, the answer to your final question
why no
select last/bottom
like it's counterpart.
is:
- without
ORDER BY
results ofSELECT LAST N
would be exactly the same as results ofSELECT TOP N
- undefined. - with
ORDER BY
result ofSELECT LAST N ... ORDER BY X ASC
is exactly the same as result ofSELECT TOP N ... ORDER BY X DESC
.
So, there is no point to have two key words that do the same thing.
There is a good point in the Pieter's answer: the word TOP
is somewhat misleading. It really means LIMIT
result set to some number of rows.
By the way, since SQL Server 2012 they added support for ANSI standard OFFSET
:
OFFSET { integer_constant | offset_row_count_expression } { ROW | ROWS }
[
FETCH { FIRST | NEXT } {integer_constant | fetch_row_count_expression } { ROW | ROWS } ONLY
]
Here adding another key word was justified that it is ANSI standard AND it adds important functionality - pagination, which didn't exist before.
I would like to thank @Razort4x here for providing a very good link to MSDN in his question. The "Advanced Scanning" section there has an excellent example of mechanism called "merry-go-round scanning", which demonstrates why the order of the results returned from a SELECT
statement cannot be guaranteed without an ORDER BY
clause.
This concept is often misunderstood and I've seen many question here on SO that would greatly benefit if they had a quote from that link.
The answer to your question
Why doesn't SQL Server have a
SELECT LAST
or saySELECT BOTTOM
or
something like that, where we don't have to specify theORDER BY
and
then it would give the last record inserted in the table at the time
of executing the query (again I am not going into details about how
would this result in case of uncommitted reads or phantom reads).
is:
The devil is in the details that you want to omit. To know which record was the "last inserted in the table at the time of executing the query" (and to know this in a somewhat consistent/non-random manner) the server would need to keep track of this information somehow. Even if it is possible in all scenarios of multiple simultaneously running transactions, it is most likely costly from the performance point of view. Not every SELECT
would request this information (in fact very few or none at all), but the overhead of tracking this information would always be there.
So, you can think of it like this: by default the server doesn't do anything specific to know/keep track of the order in which the rows were inserted, because it affects performance, but if you need to know that you can use, for example, IDENTITY
column. Microsoft could have designed the server engine in such a way that it required an IDENTITY
column in every table, but they made it optional, which is good in my opinion. I know better than the server which of my tables need IDENTITY
column and which do not.
Summary
I'd like to summarise that you can look at SELECT LAST
without ORDER BY
in two different ways.
1) When you expect SELECT LAST
to behave in line with existing SELECT TOP
. In this case result is undefined for both LAST
and TOP
, i.e. result is effectively the same. In this case it boils down to (not) having another keyword. Language developers (T-SQL language in this case) are always reluctant to add keywords, unless there are good reasons for it. In this case it is clearly avoidable.
2) When you expect SELECT LAST
to behave as SELECT LAST INSERTED ROW
. Which should, by the way, extend the same expectations to SELECT TOP
to behave as SELECT FIRST INSERTED ROW
or add new keywords LAST_INSERTED
, FIRST_INSERTED
to keep existing keyword TOP
intact. In this case it boils down to the performance and added overhead of such behaviour. At the moment the server allows you to avoid this performance penalty if you don't need this information. If you do need it IDENTITY
is a pretty good solution if you use it carefully.
SELECT TOP N with UNION and ORDER BY
A Union query works thus: execute the queries, then apply the order by clause. So with
SELECT TOP 5 [ID], [Description], [Inactive]
FROM #T1
UNION ALL
SELECT TOP 5 [ID], [Description], [Inactive]
FROM #T2
ORDER BY [Inactive], [Description];
you select five arbitrarily chosen records from #T1 plus five arbitrarily chosen records from #T2 and then you order these. So you need subqueries or with clauses. E.g.:
SELECT * FROM
(
(
SELECT TOP 5 [ID], [Description], [Inactive]
FROM #T1
ORDER BY [Inactive], [Description]
)
UNION ALL
(
SELECT TOP 5 [ID], [Description], [Inactive]
FROM #T2
ORDER BY [Inactive], [Description]
)
) t;
So your workaround is not a workaround at all, but the proper query.
Related Topics
Replacing Null and Empty String Within Select Statement
Fast Way to Generate Concatenated Strings in Oracle
Function to Get Number of Weekdays Between Two Dates Excluding Holidays
How to Select Records from Last 24 Hours Using SQL
In SQL Server, When Should You Use Go and When Should You Use Semi-Colon ;
Generate Insert SQL Statements from a CSV File
Connect by Clause in Regex_Substr
Insert Multiple Rows into Db2 Database
Executing SQL Server Agent Job from a Stored Procedure and Returning Job Result
Split String by Space and Character as Delimiter in Oracle with Regexp_Substr
Adding Constraints That Check a Separate (Linked) Table for a Value
SQL to Determine Minimum Sequential Days of Access