Does a Select Query Always Return Rows in the Same Order? Table with Clustered Index

Does a SELECT query always return rows in the same order? Table with clustered index

The order of the returned rows will not always be the same unless you explicitly state so with the ORDER BY clause. So, no.

And no; just because your 1000 queries have returned the same order it's no guarantee the 1001th query will be in the same order.

Do clustered index on a column GUARANTEES returning sorted rows according to that column

Just to be clear. Presumably, you are talking about a simple query such as:

select *
from table t;

First, if all the data on the table fits on a single page and there are no other indexes on the table, it is hard for me to imagine a scenario where the result set is not ordered by the primary key. However, this is because I think the most reasonable query plan would require a full-table scan, not because of any requirement -- documented or otherwise -- in SQL or SQL Server. Without an explicit order by, the ordering in the result set is a consequence of the query plan.

That gets to the heart of the issue. When you are talking about the ordering of the result sets, you are really talking about the query plan. And, the assumption of ordering by the primary key really means that you are assuming that the query uses full-table scan. What is ironic is that people make the assumption, without actually understanding the "why". Furthermore, people have a tendency to generalize from small examples (okay, this is part of the basis of human intelligence). Unfortunately, they see consistently that results sets from simple queries on small tables are always in primary key order and generalize to larger tables. The induction step is incorrect in this example.

What can change this? Off-hand, I think that a full table scan would return the data in primary key order if the following conditions are met:

  • Single threaded server.
  • Single file filegroup
  • No competing indexes
  • No table partitions

I'm not saying this is always true. It just seems reasonable that under these circumstances such a query would use a full table scan starting at the beginning of the table.

Even on a small table, you can get surprises. Consider:

select NonPrimaryKeyColumn
from table

The query plan would probably decide to use an index on table(NonPrimaryKeyColumn) rather than doing a full table scan. The results would not be ordered by the primary key (unless by accident). I show this example because indexes can be used for a variety of purposes, not just order by or where filtering.

If you use a multi-threaded instance of the database and you have reasonably sized tables, you will quickly learn that results without an order by have no explicit ordering.

And finally, SQL Server has a pretty smart optimizer. I think there is some reluctance to use order by in a query because users think it will automatically do a sort. SQL Server works hard to find the best execution plan for the query. IF it recognizes that the order by is redundant because of the rest of the plan, then the order by will not result in a sort.

And, of course you want to guarantee the ordering of results, you need order by in the outermost query. Even a query like this:

select *
from (select top 100 t.* from t order by col1) t

Does not guarantee that the results are ordered in the final result set. You really need to do:

select *
from (select top 100 t.* from t order by col1) t
order by col1;

to guarantee the results in a particular order. This behavior is documented here.

Sql server, does a simple select statement on a table always return the rows in the order they where inserted?

Without an ORDER BY clause, row order is never guaranteed.

Without an ORDER BY clause, the server will return records in whatever manner it chooses. This is often the value the records are saved on disk because that is most convenient, and it will generally add records to the end of the table in most designs. However, the order records are saved on disk is an implementation detail, not guaranteed query engine behavior. In most RDBMSs, as records are deleted and added, you will see that the order records are returned is relatively arbitrary.

If you have a CLUSTERED index in SQL Server, it will store records on disk in order according to the index to aid searches. If you have an IDENTITY() value which increments on each record and a CLUSTERED index on that column (a common configuration), the server will store records roughly in the order they are added, but this can be overridden (see IDENTITY_INSERT, DBCC CHECKIDENT), so there's no guarantee of that behavior. Any consistent behavior should be considered incidental to the design, not a feature. It's no guarantee. It should not be considered reliable. The server is doing what takes it the least amount of effort, not trying to be consistent with record order. Again, without an ORDER BY, the order of records returned is not guaranteed.

By definition and design of the relational model, the order of records in a database are not important. By default, records are not related to each other. Unless you as the developer somehow store information related to the desired order in the RDBMS, then there is no way to determine the order records are added.

If order is important, you must use ORDER BY. If you need to establish that order, you must do it yourself when choosing what data to store.

Clustered Index and Sorting

Note: question is cross-posted here.

if I have a clustered index on [column_a], [column_b], and [column_c] and run the same query from above, will the data ALWAYS come back sorted based on that order since that's the order that the clustered index was created on?

No.

SQL Server does not guarantee that it will return data in any order unless you specify the order. It is easy to prove things can go wrong by simply creating a covering, non-clustered index that leads on a different column:

  • Example db<>fiddle

But things can go sideways in other ways, too, e.g. when parallelism or partitioning come into play and SQL Server re-assembles the data from different threads, or when the query gets more complex using joins or filters and a different plan other than a clustered index scan makes sense. Leaving off the order by clause is telling SQL Server: "I don't care about order."

Also, just as a point of clarification:

If I have an ORDER BY clause on all the columns used in the clustered index, the execution plan will not have a sort operator.

...this is true only if the columns are listed in the exact same order as the key definition. ORDER BY c, b, a is "all the columns" but it obviously will produce different output (and require some type of sort operation to get there).

If you expect and want to be comfortable relying on a certain order, always use an ORDER BY clause.

Further reading:

  • No Seatbelt - Expecting Order without ORDER BY (Conor Cunningham)
  • Without ORDER BY, there is no default sort order. (Alexander Kuznetsov)
  • T-SQL Tuesday #56: SQL Server Assumptions (me - see #3)
  • Bad Habits to Kick : Relying on undocumented behavior (also me)
  • Why is SSMS inserting new rows at the top of a table not the bottom? (dba.se question)

Return rows in the exact order they were inserted

As others have already written, you will not be able to get the rows out of the link table in the order they were inserted.

If there is some sort of internal ordering of the rows in one or both of the tables that this link table is joining, then you can use that to try to figure out when the link table rows have been created. Basically, they cannot have been created BEFORE both of the rows containing the PK:s have been created.

But on the other hand you will not be able to find out how long after they have been created.

If you have decent backups, you could try to restore one or a few backups of varying age and then try to see if those backups also contains this strange behaviour. It could give you at least some clue about when the strangeness has started.

But the bottom line is that using just a select, there is now way to get the row out of a table like this in the order they were inserted.

Clustered index and Order by Clause

Apples and Oranges. A clustered index is a storage option. An ORDER BY is a query option. If you need ordered results, the only way to get them is to add an ORDER BY clause to your query. Period.

An index may help the query optimizer generate a more efficient plan and leverage the index as a means to satisfy the ORDER BY requirement. But by no means will the presence of an index, clustered or non-clustered, guarantee any ordering of a result.

So you absolutely need ORDER BY in your queries. You also may consider an index by Name column to help this the query. Whether an index will be used or not, it depends on many more factors. you should read Designing Indexes and The Tipping Point.

Why sorting of records in table is not as per clustered index?

In this case I am sure optimizer decided to do a full table scan or nonclustered index scan, since it is very small. You can include actual execution plan and see this:

Sample Image

You can force to use clustered index:

SELECT * FROM TRIAL WITH (INDEX(UNQ))

And you will probably get:

Sample Image

and result set:

Id  Name
1 a
5 b
2 c
3 d

But you should not really do this, because ordering is still not guarantied. If you want your results to be sorted by some columns, do it explicitly!

I will copy a fragment from the book Exam 70-461: Querying Microsoft SQL Server 2012 where you can get some good explanation:

It might seem like the output is sorted by empid, but that’s not
guaranteed. What could be more confusing is that if you run the query
repeatedly, it seems like the result keeps being returned in the same
order; but again, that’s not guaranteed. When the database engine (SQL
Server in this case) processes this query, it knows that it can return
the data in any order because there is no explicit instruction to
return the data in a specific order. It could be that, due to
optimization and other reasons, the SQL Server database engine chose
to process the data in a particular way this time. There’s even some
likelihood that such choices will be repeated if the physical
circumstances remain the same. But there’s a big difference between
what’s likely to happen due to optimization and other reasons and
what’s actually guaranteed.

The database engine may—and sometimes
does—change choices that can affect the order in which rows are
returned, knowing that it is free to do so. Examples for such changes
in choices include changes in data distribution, availability of
physical structures such as indexes, and availability of resources
like CPUs and memory. Also, with changes in the engine after an
upgrade to a newer version of the product, or even after application
of a service pack, optimization aspects may change. In turn, such
changes could affect, among other things, the order of the rows in the
result.

In short, this cannot be stressed enough: A query that doesn’t
have an explicit instruction to return the rows in a particular order
doesn’t guarantee the order of rows in the result. When you do need
such a guarantee, the only way to provide it is by adding an ORDER BY
clause to the query, and that’s the focus of the next section.

EDIT based on comments:

The thing is that even if you use clustered index it may return unordered set. Suppose you have physical order of clustered keys like (1, 2, 3, 4, 5). Most of the time you will get (1, 2, 3, 4, 5) but there can be situations when optimizer decides to do parallel reads and say it has 2 parallel reads and it reads (1, 2, 3) and (4, 5). Now it may happen that (4, 5) will be returned first and then (1, 2, 3) can be returned. If you have no order by clause engine will not spend its resources ordering that set and will give you (4, 5, 1, 2, 3). So this explains why you should always ensure you have order by clause when you want ordering.

SQL Top Function with a clustered index

No. The rule is simple: SQL tables and result sets represent unordered sets. The only exception is a result set associated with a query that has an ORDER BY in the outermost SELECT.

A clustered index affects how data is stored on each page. However, it does not guarantee that a result set built on that table will even use the clustered index.

Consider a table that has a primary, clustered key on id and a query that returns:

select top (100) othercol
from t;

This query could use an index on othercol -- avoiding the clustered index altogether.



Related Topics



Leave a reply



Submit