Calculating SQL Server Row_Number() Over() for a Derived Table

SQL Row_Number() function in Where Clause without ORDER BY?

Just in case it is useful to someone else. I just figured it out from elsewhere:

WITH MyCte AS 
(
select employee_id,
RowNum = row_number() OVER (ORDER BY (SELECT 0))
from V_EMPLOYEE
ORDER BY Employee_ID
)
SELECT employee_id
FROM MyCte
WHERE RowNum > 0

Generate calculated column with row_number() over partition by

Question 1) Can I use [window function with] partition [by] in a calculated column?

Answer: Yes, by using a function to return the value, but not persisted because it would be non-deterministic.

Question 2) Is there a better approach?

Answer: Yes, this should not be the primary key.

Why would you want this to be your primary key?

Saying that this is a business requirement doesn't make it a good idea. Find another way to accommodate your business requirements without them forcing you into horrible design decisions.

SQL Server 2005 ROW_NUMBER() without ORDER BY

You can avoid specifying an explicit ordering as follows:

INSERT dbo.TargetTable (ID, FIELD)
SELECT
Row_Number() OVER (ORDER BY (SELECT 1))
+ Coalesce(
(SELECT Max(ID) FROM dbo.TargetTable WITH (TABLOCKX, HOLDLOCK)),
0
),
FieldValue
FROM dbo.SourceTable
WHERE {somecondition};

However, please note that is merely a way to avoid specifying an ordering and does NOT guarantee that any original data ordering will be preserved. There are other factors that can cause the result to be ordered, such as an ORDER BY in the outer query. To fully understand this, one must realize that the concept "not ordered (in a particular way)" is not the same as "retaining original order" (which IS ordered in a particular way!). I believe that from a pure relational database perspective, the latter concept does not exist, by definition (though there may be database implementations that violate this, SQL Server is not one of them).

The reason for calculating the Max in the query and for adding the lock hints is to prevent errors due to a concurrent process inserting using the same value you plan to use, in between the parts of the query executing. The only other semi-reasonable workaround would be to perform the Max() and INSERT in a loop some number of times until it succeeds (still far from an ideal solution). Using an identity column is far superior. It's not good for concurrency to exclusively lock entire tables, and that is an understatement.

Note: Many people use (SELECT NULL) to get around the "no constants allowed in the ORDER BY clause of a windowing function" restriction. For some reason, I prefer 1 over NULL. What you use is up to you.

ROW_NUMBER() fails when table is too big

Below is for BigQuery Standard SQL

#standardSQL
SELECT AS VALUE ARRAY_AGG(t ORDER BY timedate DESC LIMIT 1)[OFFSET(0)]
FROM `project.dataset.table` t
GROUP BY id

You can test, play above with dummy data as below

#standardSQL
WITH `project.dataset.table` AS (
SELECT 1 id, 2 timedate, 3 z UNION ALL
SELECT 1,4,5 UNION ALL
SELECT 1,6,7 UNION ALL
SELECT 2,8,9 UNION ALL
SELECT 2, 10, 11
)
SELECT AS VALUE ARRAY_AGG(t ORDER BY timedate DESC LIMIT 1)[OFFSET(0)]
FROM `project.dataset.table` t
GROUP BY id

result is

Row id  timedate    z    
1 1 6 7
2 2 10 11

Using a SQL Ranking Function with a derived column

How about using a derived table (sub query)? I think something like the following should work

SELECT 
ROW_NUMBER() OVER (ORDER BY sub.Points) AS 'Row Number',
sub.FirstName,
sub.LastName,
sub.Points
FROM
(

SELECT
table.FirstName,
table.LastName,
CalculatedValue(table.number) As Points
FROM
table
) sub
ORDER BY
sub.Points


Related Topics



Leave a reply



Submit