Do clustered index on a column GUARANTEES returning sorted rows according to that column
Just to be clear. Presumably, you are talking about a simple query such as:
select *
from table t;
First, if all the data on the table fits on a single page and there are no other indexes on the table, it is hard for me to imagine a scenario where the result set is not ordered by the primary key. However, this is because I think the most reasonable query plan would require a full-table scan, not because of any requirement -- documented or otherwise -- in SQL or SQL Server. Without an explicit order by
, the ordering in the result set is a consequence of the query plan.
That gets to the heart of the issue. When you are talking about the ordering of the result sets, you are really talking about the query plan. And, the assumption of ordering by the primary key really means that you are assuming that the query uses full-table scan. What is ironic is that people make the assumption, without actually understanding the "why". Furthermore, people have a tendency to generalize from small examples (okay, this is part of the basis of human intelligence). Unfortunately, they see consistently that results sets from simple queries on small tables are always in primary key order and generalize to larger tables. The induction step is incorrect in this example.
What can change this? Off-hand, I think that a full table scan would return the data in primary key order if the following conditions are met:
- Single threaded server.
- Single file filegroup
- No competing indexes
- No table partitions
I'm not saying this is always true. It just seems reasonable that under these circumstances such a query would use a full table scan starting at the beginning of the table.
Even on a small table, you can get surprises. Consider:
select NonPrimaryKeyColumn
from table
The query plan would probably decide to use an index on table(NonPrimaryKeyColumn)
rather than doing a full table scan. The results would not be ordered by the primary key (unless by accident). I show this example because indexes can be used for a variety of purposes, not just order by
or where
filtering.
If you use a multi-threaded instance of the database and you have reasonably sized tables, you will quickly learn that results without an order by
have no explicit ordering.
And finally, SQL Server has a pretty smart optimizer. I think there is some reluctance to use order by
in a query because users think it will automatically do a sort. SQL Server works hard to find the best execution plan for the query. IF it recognizes that the order by
is redundant because of the rest of the plan, then the order by
will not result in a sort.
And, of course you want to guarantee the ordering of results, you need order by
in the outermost query. Even a query like this:
select *
from (select top 100 t.* from t order by col1) t
Does not guarantee that the results are ordered in the final result set. You really need to do:
select *
from (select top 100 t.* from t order by col1) t
order by col1;
to guarantee the results in a particular order. This behavior is documented here.
Does a SELECT query always return rows in the same order? Table with clustered index
The order of the returned rows will not always be the same unless you explicitly state so with the ORDER BY
clause. So, no.
And no; just because your 1000 queries have returned the same order it's no guarantee the 1001th query will be in the same order.
Clustered Index on NON-Primary Key Column
You have two assumptions
Select
query will result in the logical orderclustered Index
in your case it isNAME
column
It is wrong. Unless you mention Order By
in your select
query result order is not guaranteed. Though you are seeing the result sorted in Primary key column(Non clustered Index) but the order is not guaranteed
- Records of
Clustered Index
column is stored indata pages
in same order as Clustered Indexlogical order
.
Again it is wrong. The logical order
of data is maintained in data pages not the physical order
. In same Data page C
can be stored first and A
can be stored second. Only the logical ordering is maintained in the data pages.
Clustered Index and Sorting
Note: question is cross-posted here.
if I have a clustered index on [column_a], [column_b], and [column_c] and run the same query from above, will the data ALWAYS come back sorted based on that order since that's the order that the clustered index was created on?
No.
SQL Server does not guarantee that it will return data in any order unless you specify the order. It is easy to prove things can go wrong by simply creating a covering, non-clustered index that leads on a different column:
- Example db<>fiddle
But things can go sideways in other ways, too, e.g. when parallelism or partitioning come into play and SQL Server re-assembles the data from different threads, or when the query gets more complex using joins or filters and a different plan other than a clustered index scan makes sense. Leaving off the order by clause is telling SQL Server: "I don't care about order."
Also, just as a point of clarification:
If I have an ORDER BY clause on all the columns used in the clustered index, the execution plan will not have a sort operator.
...this is true only if the columns are listed in the exact same order as the key definition. ORDER BY c, b, a
is "all the columns" but it obviously will produce different output (and require some type of sort operation to get there).
If you expect and want to be comfortable relying on a certain order, always use an ORDER BY clause.
Further reading:
- No Seatbelt - Expecting Order without ORDER BY (Conor Cunningham)
- Without ORDER BY, there is no default sort order. (Alexander Kuznetsov)
- T-SQL Tuesday #56: SQL Server Assumptions (me - see #3)
- Bad Habits to Kick : Relying on undocumented behavior (also me)
- Why is SSMS inserting new rows at the top of a table not the bottom? (dba.se question)
Mysql:indexes and order of rows in a table
I make a select, I get a table, the records in which are ordered by id. [...] This is the expected result, because in mysql the primary key is a clustered index, therefore the data will be physically ordered by it.
There is some misunderstanding of a concept here.
Table rows have no inherent ordering: they represent unordered set of rows. While the clustered index enforces a physical ordering of data in storage, it does not guarantee the order in which rows are returned by a select
query.
If you want the results of the query to be ordered, then use an order by
clause. Without such clause, the ordering or the rows is undefined: the database is free to return results in whichever order it likes, and results are not guaranteed to be consistent over consecutive executions of the same query.
select * from table_one order by id;
select * from table_one order by name;
Are primay keys stored in sorted order or do they just appear sorted by SQL statement?
Well, primary keys aren't necessarily stored in sorted order on the disk. But clustered indexes are. And in the vast majority of cases the primary key is the clustered index. Though this doesn't necessarily guarantee sorting of results, it's just that results are usually sorted by the clustered index by default.
how can I store Guids in random order as primary key
GUIDs don't make for good clustered indexes for exactly this reason. SQL Server does have something called a Sequential GUID to address this. The resulting GUIDs won't be consecutive, but they will be sequential. It has some caveats, though:
Creates a GUID that is greater than any GUID previously generated by this function on a specified computer since Windows was started.
If the system reboots, the sequence is lost. If multiple systems create keys, the sequence is lost. Additionally, there's the problem that we're still relying on the SQL Server to generate the key, which kind of defeats a significant reason to use a GUID.
In general I would suggest not using a GUID as a clustered index. As an alternative one might use a normal IDENTITY
key as the clustered index and create a separate GUID column (with potentially an index of its own, and even a unique constraint just to make sure applications don't try to re-insert an existing record). That separate column becomes a kind of "global identifier" in a more business-logic sense, and not so much in a data persistence implementation sense.
What do Clustered and Non-Clustered index actually mean?
With a clustered index the rows are stored physically on the disk in the same order as the index. Therefore, there can be only one clustered index.
With a non clustered index there is a second list that has pointers to the physical rows. You can have many non clustered indices, although each new index will increase the time it takes to write new records.
It is generally faster to read from a clustered index if you want to get back all the columns. You do not have to go first to the index and then to the table.
Writing to a table with a clustered index can be slower, if there is a need to rearrange the data.
Related Topics
How to Use Dynamic Column Names in an Update or Select Statement in a Function
Trimmining a Column with Bad Data
Get Sundays for a Given Month Date in a Function SQL
Update Statement in Oracle Using SQL or Pl/SQL to Update First Duplicate Row Only
Difference Between Datetime Converts in Msexcel and SQL Server
Why Do SQL Id Sequences Go Out of Sync (Specifically Using Postgres)
Remove Diacritics from String in Snowflake
No Value Given for the Required Parameter
Compare Deleted and Inserted Table in SQL Server 2008
Sql: Select Records Where All Joined Records Satisfy Some Condition
Entity Framework Indexing All Foreign Key Columns
How to Join Two Tables Together with Same Number of Rows by Their Order
In SQL, What Does Using Parentheses with an or Mean
Get "Zero" for a Count at Dates Without Records
How to Avoid "Raw Variable Length Too Long" Errors in SQL Developer
Find the Date/Time a Table's Column Was Created
How to Select a Column in SQL Server with a Special Character in the Column Name