Order Guarantee for Identity Assignment in Multi-Row Insert in SQL Server

Order guarantee for identity assignment in multi-row insert in SQL Server

Piggybacking on my comment above, and knowing that the behavior of an insert / select+order by will guarantee generation of identity order (#4: from this blog)

You can use the table value constructor in the following fashion to accomplish your goal (not sure if this satisfies your other constraints) assuming you wanted your identity generation to be based on category id.

insert into thetable(CategoryId, CategoryName)
select *
from
(values
(101, 'Bikes'),
(103, 'Clothes'),
(102, 'Accessories')
) AS Category(CategoryID, CategoryName)
order by CategoryId

Is Order Guaranteed When Inserting Multiple Rows with Identity?

The very similar question was asked before.

You can specify an ORDER BY in the INSERT.

If you do that, the order in which the IDENTITY values are generated is guaranteed to match the specified ORDER BY in the INSERT.

Using your example:

DECLARE @blah TABLE
(
ID INT IDENTITY(1, 1) NOT NULL,
Name VARCHAR(100) NOT NULL
);

INSERT INTO @blah (Name)
SELECT T.Name
FROM
(
VALUES
('Timmy'),
('Jonny'),
('Sally')
) AS T(Name)
ORDER BY T.Name;

SELECT
T.ID
,T.Name
FROM @blah AS T
ORDER BY T.ID;

The result is:

+----+-------+
| ID | Name |
+----+-------+
| 1 | Jonny |
| 2 | Sally |
| 3 | Timmy |
+----+-------+

That is, Name have been sorted and IDs have been generated according to this order. It is guaranteed that Jonny will have the lowest ID, Timmy will have the highest ID, Sally will have ID between them. There may be gaps between the generated ID values, but their relative order is guaranteed.

If you don't specify ORDER BY in INSERT, then resulting IDENTITY IDs can be generated in a different order.

Mind you, there is no guarantee for the actual physical order of rows in the table even with ORDER BY in INSERT, the only guarantee is the generated IDs.

In a question INSERT INTO as SELECT with ORDER BY Umachandar Jayachandran from MS said:

The only guarantee is that the identity values will be generated based
on the ORDER BY clause. But there is no guarantee for the order of
insertion of the rows into the table.

And he gave a link to Ordering guarantees in SQL Server, where Conor Cunningham from SQL Server Engine Team says:


  1. INSERT queries that use SELECT with ORDER BY to populate rows guarantees how identity values are computed but not the order in which
    the rows are inserted

There is a link to MS knowledge base article in the comments in that post: The behavior of the IDENTITY function when used with SELECT INTO or INSERT .. SELECT queries that contain an ORDER BY clause, which explains it in more details. It says:

If you want the IDENTITY values to be assigned in a sequential fashion
that follows the ordering in the ORDER BY clause, create a table that
contains a column with the IDENTITY property and then run an INSERT ... SELECT ... ORDER BY query to populate this table.

I would consider this KB article as an official documentation and consider this behaviour guaranteed.

Is an IDENTITY column auto-incremented before or after an order by clause is applied to it?

The SQL Server Engine Team have made this blog post:

INSERT queries that use SELECT with ORDER BY to populate rows guarantees how identity values are computed but not the order in which the rows are inserted

They clarify what "the order in which the rows are inserted" means in the comments:

Yes, the identity values will be generated in the sequence established by the ORDER BY. If a clustered index exists on the identity column, then the values will be in the logical order of the index keys. This still doesn't guarantee physical order of insertion. Index maintenance is a different step and that could also be done in parallel for example. So you could end up generating the identity values based on ORDER BY clause and then feeding those rows to the clustered index insert operator which will perform the maintenance task.

How to control order of assignment for new identity column in SQL Server?

Following on from Remus' theoretical answer... you need to generate a list first with your ideal ordering

SELECT
ID, CreateDate
INTO
MyNewTable
FROM
(
SELECT
CreateDate,
ROW_NUMBER() OVER (ORDER BY CreateDate ASC) AS ID
FROM
MyTable
) foo

Then, the best solution is to use SSMS to add the IDENTITY property to MyNewTable. SSMS will generate a script that includes SET IDENTITY INSERT to preserve the order

Note: IDENTITY columns are just numbers that have no implicit meaning and nothing should be inferred by their alignment with the CreateDate after this exercise...

How can I insert multiple rows into a table and get all new identity valued in order?

Yes. the insert will always work, once you include the order by, the insert will be executed in that order.

Here I change the staging order, btw you dont need OUTPUT

SQL DEMO

insert #Staging (TrackingId, Value) values (201,1000),(204,2000),(203,2000),(202,1000);
^ ^ ^ ^

INSERT INTO #Target (Value <, otherfields>)
SELECT TrackingID <, otherfields>
FROM #Staging
ORDER BY TrackingID
;

SELECT *
FROM #Target;

Please read the comments below in that article the answer from the author:

  • Could you elaborate on statement #4.

Yes, the identity values will be generated in the sequence established by the ORDER BY. If a clustered index exists on the identity column, then the values will be in the logical order of the index keys. This still doesn’t guarantee physical order of insertion. Index maintenance is a different step and that could also be done in parallel for example. So you could end up generating the identity values based on ORDER BY clause and then feeding those rows to the clustered index insert operator which will perform the maintenance task. You can see this in the query plan. You should really NOT think about physical operations or order but instead think of a table as a unordered set of rows. The index can be used to sort rows in logical manner (using ORDER BY clause) efficiently.

How does SQL Server generate values in an identity column?

You are making the common fallacy of assuming an order in the table. Tables have no order. Only results have order, which is undetermined unless an explicit ORDER BY is specified.

You may ask a different question: how is the identity generated value assigned in case of concurrent inserts? The answer is simple: it doesn't matter. And if you make any assumption about the order then your code is broken. Same goes for gaps. Your application should work even if the identities generated are completely random, and correctly written application will work if the identity is completely random. Use SCOPE_IDENTITY() to retrieve the last inserted identity. Better still, use the OUTPUT clause of INSERT, it works for multi-row inserts too.

For the record: the identities are generated in the order on which operations acquire access to the log stream.

Combine OUTPUT inserted.id with value from selected row

You can (ab)use MERGE with OUTPUT clause.

MERGE can INSERT, UPDATE and DELETE rows. In our case we need only to INSERT.
1=0 is always false, so the NOT MATCHED BY TARGET part is always executed.
In general, there could be other branches, see docs.
WHEN MATCHED is usually used to UPDATE;
WHEN NOT MATCHED BY SOURCE is usually used to DELETE, but we don't need them here.

This convoluted form of MERGE is equivalent to simple INSERT,
but unlike simple INSERT its OUTPUT clause allows to refer to the columns that we need.
It allows to retrieve columns from both source and destination tables thus saving a mapping between old and new IDs.

MERGE INTO [dbo].[Test]
USING
(
SELECT [Data]
FROM @Old AS O
) AS Src
ON 1 = 0
WHEN NOT MATCHED BY TARGET THEN
INSERT ([Data])
VALUES (Src.[Data])
OUTPUT Src.ID AS OldID, inserted.ID AS NewID
INTO @New(ID, [OtherID])
;

Regarding your update and relying on the order of generated IDENTITY values.

In the simple case, when [dbo].[Test] has IDENTITY column, then INSERT with ORDER BY will guarantee that the generated IDENTITY values would be in the specified order. See point 4 in Ordering guarantees in SQL Server. Mind you, it doesn't guarantee the physical order of inserted rows, but it guarantees the order in which IDENTITY values are generated.

INSERT INTO [dbo].[Test] ([Data])
SELECT [Data]
FROM @Old
ORDER BY [RowID]

But, when you use the OUTPUT clause:

INSERT INTO [dbo].[Test] ([Data])
OUTPUT inserted.[ID] INTO @New
SELECT [Data]
FROM @Old
ORDER BY [RowID]

the rows in the OUTPUT stream are not ordered. At least, strictly speaking, ORDER BY in the query applies to the primary INSERT operation, but there is nothing there that says what is the order of the OUTPUT. So, I would not try to rely on that. Either use MERGE or add an extra column to store the mapping between IDs explicitly.

Insert multiple rows with incremental primary key sql

INSERT INTO TABLE1 (COLUMN1, PRIMARY_KEY)
SELECT COLUMN1,
(SELECT COALESCE(MAX(PRIMARY_KEY),0)
FROM TABLE1) + row_number() over (order by 1/0)
FROM TABLE 2

For this statement alone, the IDs will be sequential, e.g. if Max(Primary Key) is 99 and it is inserting 4 records, they will be 100, 101, 102, 103. It's very prone to constraint violations if multiple processes are inserting at the same time, but that's not to say it is any worse than what you have with a single record anyway using MAX() which is inherently unsafe.

Sequential numbers for many rows inserted with auto-increment key

I was curious enough to test. On my virtual machine with SQL Server 2014 Express the answer is:


Generated IDENTITY values are not guaranteed to be sequential when multiple threads insert values. Even if it is a single INSERT statement that inserts several rows at once. (Under default transaction isolation level)


You can test it on your SQL Server 2008, but even if you don't see the same behaviour, it wouldn't be wise to rely on it, because it definitely changed in 2014.

Here is the full script to reproduce the test.

Table

CREATE TABLE [dbo].[test](
[ID] [int] IDENTITY(1,1) NOT NULL,
[dt] [datetime2](7) NOT NULL,
[V] [int] NOT NULL,
CONSTRAINT [PK_test] PRIMARY KEY CLUSTERED
(
[ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]

GO

INSERT script

WAITFOR TIME '22:23:24';
-- set the time to about a minute in the future
-- open two windows in SSMS and run this script (F5) in both of them
-- they will start running at the same time specified above in parallel.

-- insert 1M rows in chunks of 1000 rows

-- in the first SSMS window uncomment these lines:
--DECLARE @VarV int = 0;
--WHILE (@VarV < 1000)

-- in the second SSMS window uncomment these lines:
--DECLARE @VarV int = 10000;
--WHILE (@VarV < 11000)

BEGIN

WITH e1(n) AS
(
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
) -- 10
,e2(n) AS (SELECT 1 FROM e1 CROSS JOIN e1 AS b) -- 10*10
,e3(n) AS (SELECT 1 FROM e1 CROSS JOIN e2) -- 10*100
,CTE_rn
AS
(
SELECT ROW_NUMBER() OVER (ORDER BY n) AS rn
FROM e3
)
INSERT INTO [dbo].[test]
([dt]
,[V])
SELECT
SYSDATETIME() AS dt
,@VarV
FROM CTE_rn;

SET @VarV = @VarV + 1;

END;

Verifying the results

WITH
CTE
AS
(
SELECT
[V]
,MIN(ID) AS MinID
,MAX(ID) AS MaxID
,MAX(ID) - MIN(ID) + 1 AS DiffID
FROM [dbo].[test]
GROUP BY V
)
SELECT
DiffID
,COUNT(*) AS c
FROM CTE
GROUP BY DiffID
ORDER BY c DESC;

This query calculates the MIN and MAX ID for each V (each chunk of 1000 inserted rows). If all IDENTITY values were generated sequentially, the difference between MAX and MIN IDs would always be exactly 1000. As we can see in the results, this is not the case:

Result

+--------+------+
| DiffID | c |
+--------+------+
| 1000 | 1940 |
| 2000 | 6 |
| 3000 | 3 |
| 1759 | 2 |
| 1477 | 2 |
| 1522 | 1 |
| 1524 | 1 |
| 1529 | 1 |
| 1538 | 1 |
| 1546 | 1 |
| 1548 | 1 |
| 1584 | 1 |
| 1585 | 1 |
| 1589 | 1 |
| 1597 | 1 |
| 1606 | 1 |
| 1611 | 1 |
| 1612 | 1 |
| 1620 | 1 |
| 1630 | 1 |
| 1631 | 1 |
| 1635 | 1 |
| 1658 | 1 |
| 1663 | 1 |
| 1675 | 1 |
| 1731 | 1 |
| 1749 | 1 |
| 1009 | 1 |
| 1038 | 1 |
| 1049 | 1 |
| 1055 | 1 |
| 1086 | 1 |
| 1102 | 1 |
| 1144 | 1 |
| 1218 | 1 |
| 1225 | 1 |
| 1263 | 1 |
| 1325 | 1 |
| 1367 | 1 |
| 1372 | 1 |
| 1415 | 1 |
| 1451 | 1 |
| 1761 | 1 |
| 1793 | 1 |
| 1832 | 1 |
| 1904 | 1 |
| 1919 | 1 |
| 1924 | 1 |
| 1954 | 1 |
| 1973 | 1 |
| 1984 | 1 |
| 2381 | 1 |
+--------+------+

In most cases, indeed, IDENTITY values were assigned sequentially, but in 60 cases out of 2000, they were not.


How to deal with it?

I personally prefer to use sp_getapplock, rather than locking the table or increasing transaction isolation level.

But, end result is the same - you have to make sure that INSERT statements are not running in parallel.


In SQL Server 2012+ it is worth testing the behaviour of the new SEQUENCE feature. Specifically, the sp_sequence_get_range stored procedure that generates a range of sequence values from a sequence object. Let's leave this exercise to the reader.



Related Topics



Leave a reply



Submit