Order by in a SQL Server 2008 View

Create a view with ORDER BY clause

I'm not sure what you think this ORDER BY is accomplishing? Even if you do put ORDER BY in the view in a legal way (e.g. by adding a TOP clause), if you just select from the view, e.g. SELECT * FROM dbo.TopUsersTest; without an ORDER BY clause, SQL Server is free to return the rows in the most efficient way, which won't necessarily match the order you expect. This is because ORDER BY is overloaded, in that it tries to serve two purposes: to sort the results and to dictate which rows to include in TOP. In this case, TOP always wins (though depending on the index chosen to scan the data, you might observe that your order is working as expected - but this is just a coincidence).

In order to accomplish what you want, you need to add your ORDER BY clause to the queries that pull data from the view, not to the code of the view itself.

So your view code should just be:

CREATE VIEW [dbo].[TopUsersTest] 
AS
SELECT
u.[DisplayName], SUM(a.AnswerMark) AS Marks
FROM
dbo.Users_Questions AS uq
INNER JOIN [dbo].[Users] AS u
ON u.[UserID] = us.[UserID]
INNER JOIN [dbo].[Answers] AS a
ON a.[AnswerID] = uq.[AnswerID]
GROUP BY u.[DisplayName];

The ORDER BY is meaningless so should not even be included.


To illustrate, using AdventureWorks2012, here is an example:

CREATE VIEW dbo.SillyView
AS
SELECT TOP 100 PERCENT
SalesOrderID, OrderDate, CustomerID , AccountNumber, TotalDue
FROM Sales.SalesOrderHeader
ORDER BY CustomerID;
GO

SELECT SalesOrderID, OrderDate, CustomerID, AccountNumber, TotalDue
FROM dbo.SillyView;

Results:

SalesOrderID   OrderDate   CustomerID   AccountNumber   TotalDue
------------ ---------- ---------- -------------- ----------
43659 2005-07-01 29825 10-4020-000676 23153.2339
43660 2005-07-01 29672 10-4020-000117 1457.3288
43661 2005-07-01 29734 10-4020-000442 36865.8012
43662 2005-07-01 29994 10-4020-000227 32474.9324
43663 2005-07-01 29565 10-4020-000510 472.3108

And you can see from the execution plan that the TOP and ORDER BY have been absolutely ignored and optimized away by SQL Server:

Sample Image

There is no TOP operator at all, and no sort. SQL Server has optimized them away completely.

Now, if you change the view to say ORDER BY SalesID, you will then just happen to get the ordering that the view states, but only - as mentioned before - by coincidence.

But if you change your outer query to perform the ORDER BY you wanted:

SELECT SalesOrderID, OrderDate, CustomerID, AccountNumber, TotalDue
FROM dbo.SillyView
ORDER BY CustomerID;

You get the results ordered the way you want:

SalesOrderID   OrderDate   CustomerID   AccountNumber   TotalDue
------------ ---------- ---------- -------------- ----------
43793 2005-07-22 11000 10-4030-011000 3756.989
51522 2007-07-22 11000 10-4030-011000 2587.8769
57418 2007-11-04 11000 10-4030-011000 2770.2682
51493 2007-07-20 11001 10-4030-011001 2674.0227
43767 2005-07-18 11001 10-4030-011001 3729.364

And the plan still has optimized away the TOP/ORDER BY in the view, but a sort is added (at no small cost, mind you) to present the results ordered by CustomerID:

Sample Image

So, moral of the story, do not put ORDER BY in views. Put ORDER BY in the queries that reference them. And if the sorting is expensive, you might consider adding/changing an index to support it.

ORDER BY in a Sql Server 2008 view

The order of rows returned by a view with an ORDER BY clause is never guaranteed. If you need a specific row order, you must specify where you select from the view.

See this the note at the top of this Book On-Line entry.

Is it possible to select a specific ORDER BY in SQL Server 2008?

Hmm.. that's nasty, the days are stored as verbatim 'Monday', 'Tuesday', etc?

Anyway, just do this:

SELECT * 
FROM Requirements
ORDER BY
CASE Day
WHEN 'Monday' THEN 1
WHEN 'Tuesday' THEN 2
WHEN 'Wednesday' THEN 3
WHEN 'Thursday' THEN 4
WHEN 'Friday' THEN 5
WHEN 'Saturday' THEN 6
WHEN 'Sunday' THEN 7
END

Default row order in SELECT query - SQL Server 2008 vs SQL 2012

You need to go back and add ORDER BY clauses to your code because without them the order is never guaranteed. You were "lucky" in the past that you always got the same order but it wasn't because SQL Server 2008 guaranteed it in anyway. It most likely had to do with your indexes or how the data was being stored on the disk.

If you moved to a new host when you upgraded the difference in hardware configuration alone could have changed the way your queries execute. Not to mention the fact that the new server would have recalculated statistics on the tables and the SQL Server 2012 query optimizer probably does things a bit differently than the one in SQL Server 2008.

It is a fallacy that you can rely on the order of a result set in SQL without explicitly stating the order you want it in. SQL results NEVER have an order you can rely on without using an ORDER BY clause. SQL is built around set theory. Query results are basically sets (or multi-sets).

Itzik Ben-Gan gives a good description of set theory in relation to SQL in his book Microsoft SQL Server 2012 T-SQL Fundamentals

Set theory, which originated with the mathematician Georg Cantor, is
one of the mathematical branches on which the relational model is
based. Cantor's definition of a set follows:

By a "set" we mean any collection M into a whole of definite, distinct
objects m (which are called the "elements" of M) of our perception or
of our thought. - Joseph W. Dauben and Georg Cantor (Princeton
University Press, 1990)

After a thorough explanation of the terms in the definition Itzik then goes on to say:

What Cantor's definition of a set leaves out is probably as important
as what it includes. Notice that the definition doesn't mention any
order among the set elements. The order in which set elements are
listed is not imporant. The formal notation for listing set elements
uses curly brackets: {a, b, c}. Because order has no relevance you can
express the same set as {b, a, c} or {b, c, a}. Jumping ahead to the
set of attributes (called columns in SQL) that make up the header of a
relation (called a table in SQL), an element is supposed to be
identified by name - not ordinal position. Similarly, consider the set
of tuples (called rows by SQL) that make up the body of the relation;
an element is identified by its key values - not by position. Many
programmers have a hard time adapting to the idea that, with respect
to querying tables, there is no order among the rows. In other words,
a query against a table can return rows in any order unless you
explicitly request that the data be sorted in a specific way, perhaps
for presentation purposes.

But regardless of the academic definition of a set even the implementation in SQL server has never guaranteed any order in the results. This MSDN blog post from 2005 by a member of the query optimizer team states that you should not rely on the order from intermediate operations at all.

The reordering rules can and will violate this assumption (and do so
when it is inconvenient to you, the developer ;). Please understand
that when we reorder operations to find a more efficient plan, we can
cause the ordering behavior to change for intermediate nodes in the
tree. If you’ve put an operation in the tree that assumes a
particular intermediate ordering, it can break.

This blog post by Conor Cunningham (Architect, SQL Server Core Engine) "No Seatbelt - Expecting Order without ORDER BY" is about SQL Server 2008. He has a table with 20k rows in it with a single index that appears to always return rows in the same order. Adding an ORDER BY to the query doesn't even change the execution plan, so it isn't like adding one in makes the query more expensive if the optimizer realizes it doesn't need it. But once he adds another 20k rows to the table suddenly the query plan changes and now it uses parallelism and the results are no longer ordered!

The hard part here is that there is no reasonable way for any external
user to know when a plan will change . The space of all plans is huge
and hurts your head to ponder. SQL Server's optimizer will change
plans, even for simple queries, if enough of the parameters change.
You may get lucky and not have a plan change, or you can just not
think about this problem and add an ORDER BY.

If you need more convincing just read these posts:

  • Without ORDER BY, there is no default sort order. - Alexander Kuznetsov
  • Order in the court! - Thomas Kyte
  • Order of a Result Set in SQL - Timothy Wiseman

SELECT * INTO retains ORDER BY in SQL Server 2008 but not 2012

How can you tell what the order is inside a table by using select * from #result? There is no guarantee as to the order in a select query.

However, the results are different on SQL Fiddle. If you want to guarantee that the results are the same, then add a primary key. Then the insertion order is guaranteed:

CREATE TABLE MyTable(Name VARCHAR(50), SortOrder INT)
INSERT INTO MyTable SELECT 'b', 2 UNION ALL SELECT 'c', 3 UNION ALL SELECT 'a', 1 UNION ALL SELECT 'e', 5 UNION ALL SELECT 'd', 4

select top 0 * into result from MyTable;

alter table Result add id int identity(1, 1) primary key;

insert into Result(name, sortorder)
SELECT * FROM MyTable
ORDER BY SortOrder;

I still abhor doing select * from Result after this. But yes, it does return them in the correct order in both SQL Server 2008 and 2012. Not only that, but because SQL Server guarantees that primary keys are inserted in the proper order, the records are even guaranteed to be in the correct order in this case.

BUT . . . just because the records are in a particular order on the pages doesn't mean they will be retrieved in that order with no order by clause.

Sort order of an SQL Server 2008+ clustered index

There is a difference. Inserting out of Cluster Order causes massive fragmentation.

When you run the following code the DESC clustered index is generating additional UPDATE operations at the NONLEAF level.

CREATE TABLE dbo.TEST_ASC(ID INT IDENTITY(1,1) 
,RandNo FLOAT
);
GO
CREATE CLUSTERED INDEX cidx ON dbo.TEST_ASC(ID ASC);
GO

CREATE TABLE dbo.TEST_DESC(ID INT IDENTITY(1,1)
,RandNo FLOAT
);
GO
CREATE CLUSTERED INDEX cidx ON dbo.TEST_DESC(ID DESC);
GO

INSERT INTO dbo.TEST_ASC VALUES(RAND());
GO 100000

INSERT INTO dbo.TEST_DESC VALUES(RAND());
GO 100000

The two Insert statements produce exactly the same Execution Plan but when looking at the operational stats the differences show up against [nonleaf_update_count].

SELECT 
OBJECT_NAME(object_id)
,*
FROM sys.dm_db_index_operational_stats(DB_ID(),OBJECT_ID('TEST_ASC'),null,null)
UNION
SELECT
OBJECT_NAME(object_id)
,*
FROM sys.dm_db_index_operational_stats(DB_ID(),OBJECT_ID('TEST_DESC'),null,null)

There is an extra –under the hood- operation going on when SQL is working with DESC index that runs against the IDENTITY.
This is because the DESC table is becoming fragmented (rows inserted at the start of the page) and additional updates occur to maintain the B-tree structure.

The most noticeable thing about this example is that the DESC Clustered Index becomes over 99% fragmented. This is recreating the same bad behaviour as using a random GUID for a clustered index.
The below code demonstrates the fragmentation.

SELECT 
OBJECT_NAME(object_id)
,*
FROM sys.dm_db_index_physical_stats (DB_ID(), OBJECT_ID('dbo.TEST_ASC'), NULL, NULL ,NULL)
UNION
SELECT
OBJECT_NAME(object_id)
,*
FROM sys.dm_db_index_physical_stats (DB_ID(), OBJECT_ID('dbo.TEST_DESC'), NULL, NULL ,NULL)

UPDATE:

On some test environments I'm also seeing that the DESC table is subject to more WAITS with an increase in [page_io_latch_wait_count] and [page_io_latch_wait_in_ms]

UPDATE:

Some discussion has arisen about what is the point of a Descending Index when SQL can perform Backward Scans. Please read this article about the limitations of Backward Scans.

Why does SQL Server 2008 order when using a GROUP BY and no order has been specified?

To answer this question, look at the query plans produced by both.

The first SELECT is a simple table scan, which means that it produces rows in allocation order. Since this is a new table, it matches the order you inserted the records.

The second SELECT adds a GROUP BY, which SQL Server implements via a distinct sort since the estimated row count is so low. Were you to have more rows or add an aggregate to your SELECT, this operator may change.

For example, try:

CREATE TABLE #Values ( FieldValue varchar(50) )

;WITH FieldValues AS
(
SELECT '4' FieldValue UNION ALL
SELECT '3' FieldValue UNION ALL
SELECT '2' FieldValue UNION ALL
SELECT '1' FieldValue
)
INSERT INTO #Values ( FieldValue )
SELECT
A.FieldValue
FROM FieldValues A
CROSS JOIN FieldValues B
CROSS JOIN FieldValues C
CROSS JOIN FieldValues D
CROSS JOIN FieldValues E
CROSS JOIN FieldValues F

SELECT
FieldValue
FROM #Values
GROUP BY
FieldValue

DROP TABLE #Values

Due to the number of rows, this changes into a hash aggregate, and now there is no sort in the query plan.

With no ORDER BY, SQL Server can return the results in any order, and the order it comes back in is a side-effect of how it thinks it can most quickly return the data.



Related Topics



Leave a reply



Submit