Do Clustered Index on a Column Guarantees Returning Sorted Rows According to That Column

Do clustered index on a column GUARANTEES returning sorted rows according to that column

Just to be clear. Presumably, you are talking about a simple query such as:

select *
from table t;

First, if all the data on the table fits on a single page and there are no other indexes on the table, it is hard for me to imagine a scenario where the result set is not ordered by the primary key. However, this is because I think the most reasonable query plan would require a full-table scan, not because of any requirement -- documented or otherwise -- in SQL or SQL Server. Without an explicit order by, the ordering in the result set is a consequence of the query plan.

That gets to the heart of the issue. When you are talking about the ordering of the result sets, you are really talking about the query plan. And, the assumption of ordering by the primary key really means that you are assuming that the query uses full-table scan. What is ironic is that people make the assumption, without actually understanding the "why". Furthermore, people have a tendency to generalize from small examples (okay, this is part of the basis of human intelligence). Unfortunately, they see consistently that results sets from simple queries on small tables are always in primary key order and generalize to larger tables. The induction step is incorrect in this example.

What can change this? Off-hand, I think that a full table scan would return the data in primary key order if the following conditions are met:

  • Single threaded server.
  • Single file filegroup
  • No competing indexes
  • No table partitions

I'm not saying this is always true. It just seems reasonable that under these circumstances such a query would use a full table scan starting at the beginning of the table.

Even on a small table, you can get surprises. Consider:

select NonPrimaryKeyColumn
from table

The query plan would probably decide to use an index on table(NonPrimaryKeyColumn) rather than doing a full table scan. The results would not be ordered by the primary key (unless by accident). I show this example because indexes can be used for a variety of purposes, not just order by or where filtering.

If you use a multi-threaded instance of the database and you have reasonably sized tables, you will quickly learn that results without an order by have no explicit ordering.

And finally, SQL Server has a pretty smart optimizer. I think there is some reluctance to use order by in a query because users think it will automatically do a sort. SQL Server works hard to find the best execution plan for the query. IF it recognizes that the order by is redundant because of the rest of the plan, then the order by will not result in a sort.

And, of course you want to guarantee the ordering of results, you need order by in the outermost query. Even a query like this:

select *
from (select top 100 t.* from t order by col1) t

Does not guarantee that the results are ordered in the final result set. You really need to do:

select *
from (select top 100 t.* from t order by col1) t
order by col1;

to guarantee the results in a particular order. This behavior is documented here.

Does a SELECT query always return rows in the same order? Table with clustered index

The order of the returned rows will not always be the same unless you explicitly state so with the ORDER BY clause. So, no.

And no; just because your 1000 queries have returned the same order it's no guarantee the 1001th query will be in the same order.

Clustered Index on NON-Primary Key Column

You have two assumptions

  1. Select query will result in the logical order clustered Index in your case it is NAME column

It is wrong. Unless you mention Order By in your select query result order is not guaranteed. Though you are seeing the result sorted in Primary key column(Non clustered Index) but the order is not guaranteed


  1. Records of Clustered Index column is stored in data pages in same order as Clustered Index logical order.

Again it is wrong. The logical order of data is maintained in data pages not the physical order. In same Data page C can be stored first and A can be stored second. Only the logical ordering is maintained in the data pages.

Clustered Index and Sorting

Note: question is cross-posted here.

if I have a clustered index on [column_a], [column_b], and [column_c] and run the same query from above, will the data ALWAYS come back sorted based on that order since that's the order that the clustered index was created on?

No.

SQL Server does not guarantee that it will return data in any order unless you specify the order. It is easy to prove things can go wrong by simply creating a covering, non-clustered index that leads on a different column:

  • Example db<>fiddle

But things can go sideways in other ways, too, e.g. when parallelism or partitioning come into play and SQL Server re-assembles the data from different threads, or when the query gets more complex using joins or filters and a different plan other than a clustered index scan makes sense. Leaving off the order by clause is telling SQL Server: "I don't care about order."

Also, just as a point of clarification:

If I have an ORDER BY clause on all the columns used in the clustered index, the execution plan will not have a sort operator.

...this is true only if the columns are listed in the exact same order as the key definition. ORDER BY c, b, a is "all the columns" but it obviously will produce different output (and require some type of sort operation to get there).

If you expect and want to be comfortable relying on a certain order, always use an ORDER BY clause.

Further reading:

  • No Seatbelt - Expecting Order without ORDER BY (Conor Cunningham)
  • Without ORDER BY, there is no default sort order. (Alexander Kuznetsov)
  • T-SQL Tuesday #56: SQL Server Assumptions (me - see #3)
  • Bad Habits to Kick : Relying on undocumented behavior (also me)
  • Why is SSMS inserting new rows at the top of a table not the bottom? (dba.se question)

Mysql:indexes and order of rows in a table

I make a select, I get a table, the records in which are ordered by id. [...] This is the expected result, because in mysql the primary key is a clustered index, therefore the data will be physically ordered by it.

There is some misunderstanding of a concept here.

Table rows have no inherent ordering: they represent unordered set of rows. While the clustered index enforces a physical ordering of data in storage, it does not guarantee the order in which rows are returned by a select query.

If you want the results of the query to be ordered, then use an order by clause. Without such clause, the ordering or the rows is undefined: the database is free to return results in whichever order it likes, and results are not guaranteed to be consistent over consecutive executions of the same query.

select * from table_one order by id;
select * from table_one order by name;

Are primay keys stored in sorted order or do they just appear sorted by SQL statement?

Well, primary keys aren't necessarily stored in sorted order on the disk. But clustered indexes are. And in the vast majority of cases the primary key is the clustered index. Though this doesn't necessarily guarantee sorting of results, it's just that results are usually sorted by the clustered index by default.

how can I store Guids in random order as primary key

GUIDs don't make for good clustered indexes for exactly this reason. SQL Server does have something called a Sequential GUID to address this. The resulting GUIDs won't be consecutive, but they will be sequential. It has some caveats, though:

Creates a GUID that is greater than any GUID previously generated by this function on a specified computer since Windows was started.

If the system reboots, the sequence is lost. If multiple systems create keys, the sequence is lost. Additionally, there's the problem that we're still relying on the SQL Server to generate the key, which kind of defeats a significant reason to use a GUID.

In general I would suggest not using a GUID as a clustered index. As an alternative one might use a normal IDENTITY key as the clustered index and create a separate GUID column (with potentially an index of its own, and even a unique constraint just to make sure applications don't try to re-insert an existing record). That separate column becomes a kind of "global identifier" in a more business-logic sense, and not so much in a data persistence implementation sense.

What do Clustered and Non-Clustered index actually mean?

With a clustered index the rows are stored physically on the disk in the same order as the index. Therefore, there can be only one clustered index.

With a non clustered index there is a second list that has pointers to the physical rows. You can have many non clustered indices, although each new index will increase the time it takes to write new records.

It is generally faster to read from a clustered index if you want to get back all the columns. You do not have to go first to the index and then to the table.

Writing to a table with a clustered index can be slower, if there is a need to rearrange the data.



Related Topics



Leave a reply



Submit