Database-Wide Unique-Yet-Simple Identifiers in SQL Server

Database-wide unique-yet-simple identifiers in SQL Server

Why not use identities on all the tables, but any time you present it to the user, simply tack on a single char for the type? e.g. O1234 is an order, D123213 is a delivery, etc.? That way you don't have to engineer some crazy scheme...

SQL Server Globally Unique Identifiers

One reason is replication and scalability: When multiple, uncoordinated writers exist, they -- by definition -- have no channel to communicate for creating unique ints for the primary key. GUIDs don't need communication to be unique.

INT vs Unique-Identifier for ID field in database

GUIDs are problematic as clustered keys because of the high randomness. This issue was addressed by Paul Randal in the last Technet Magazine Q&A column: I'd like to use a GUID as the clustered index key, but the others are arguing that it can lead to performance issues with indexes. Is this true and, if so, can you explain why?

Now bear in mind that the discussion is specifically about clustered indexes. You say you want to use the column as 'ID', that is unclear if you mean it as clustered key or just primary key. Typically the two overlap, so I'll assume you want to use it as clustered index. The reasons why that is a poor choice are explained in the link to the article I mentioned above.

For non clustered indexes GUIDs still have some issues, but not nearly as big as when they are the leftmost clustered key of the table. Again, the randomness of GUIDs introduces page splits and fragmentation, be it at the non-clustered index level only (a much smaller problem).

There are many urban legends surrounding the GUID usage that condemn them based on their size (16 bytes) compared to an int (4 bytes) and promise horrible performance doom if they are used. This is slightly exaggerated. A key of size 16 can be a very peformant key still, on a properly designed data model. While is true that being 4 times as big as a int results in more a lower density non-leaf pages in indexes, this is not a real concern for the vast majority of tables. The b-tree structure is a naturally well balanced tree and the depth of tree traversal is seldom an issue, so seeking a value based on GUID key as opposed to a INT key is similar in performance. A leaf-page traversal (ie. a table scan) does not look at the non-leaf pages, and the impact of GUID size on the page size is typically quite small, as the record itself is significantly larger than the extra 12 bytes introduced by the GUID. So I'd take the hear-say advice based on 'is 16 bytes vs. 4' with a, rather large, grain of salt. Analyze on individual case by case and decide if the size impact makes a real difference: how many other columns are in the table (ie. how much impact has the GUID size on the leaf pages) and how many references are using it (ie. how many other tables will increase because of the fact they need to store a larger foreign key).

I'm calling out all these details in a sort of makeshift defense of GUIDs because they been getting a lot of bad press lately and some is undeserved. They have their merits and are indispensable in any distributed system (the moment you're talking data movement, be it via replication or sync framework or whatever). I've seen bad decisions being made out based on the GUID bad reputation when they were shun without proper consideration. But is true, if you have to use a GUID as clustered key, make sure you address the randomness issue: use sequential guids when possible.

And finally, to answer your question: if you don't have a specific reason to use GUIDs, use INTs.

SQL Server Globally Unique Identifiers

One reason is replication and scalability: When multiple, uncoordinated writers exist, they -- by definition -- have no channel to communicate for creating unique ints for the primary key. GUIDs don't need communication to be unique.

How to automatically generate unique id in SQL like UID12345678?

The only viable solution in my opinion is to use

  • an ID INT IDENTITY(1,1) column to get SQL Server to handle the automatic increment of your numeric value
  • a computed, persisted column to convert that numeric value to the value you need

So try this:

CREATE TABLE dbo.tblUsers
(ID INT IDENTITY(1,1) NOT NULL PRIMARY KEY CLUSTERED,
UserID AS 'UID' + RIGHT('00000000' + CAST(ID AS VARCHAR(8)), 8) PERSISTED,
.... your other columns here....
)

Now, every time you insert a row into tblUsers without specifying values for ID or UserID:

INSERT INTO dbo.tblUsersCol1, Col2, ..., ColN)
VALUES (Val1, Val2, ....., ValN)

then SQL Server will automatically and safely increase your ID value, and UserID will contain values like UID00000001, UID00000002,...... and so on - automatically, safely, reliably, no duplicates.

Update: the column UserID is computed - but it still OF COURSE has a data type, as a quick peek into the Object Explorer reveals:

Sample Image

How to generate a unique numeric ID in SQL Server (not using identity)?

You can create a SEQUENCE object that produces incrementing values. A SEQUENCE can be used independently or as a default value for one or more tables.

You can create a sequence with CREATE SEQUENCE :

CREATE SEQUENCE Audit.EventCounter  
AS int
START WITH 1
INCREMENT BY 1 ;

You can retrieve the next value atomically with NEXT VALUE FOR and use it in multiple statements eg :

DECLARE @NextID int ;  
SET @NextID = NEXT VALUE FOR Audit.EventCounter;

Rolling back a transaction doesn't affect a SEQUENCE. From the docs:

Sequence numbers are generated outside the scope of the current transaction. They are consumed whether the transaction using the sequence number is committed or rolled back.

You can use NEXT VALUE FOR as a default in multiple tables. In the documentation example, three different types of event table use the same SEQUENCE allowing all events to get unique numbers:

CREATE TABLE Audit.ProcessEvents  
(
EventID int PRIMARY KEY CLUSTERED
DEFAULT (NEXT VALUE FOR Audit.EventCounter),
EventTime datetime NOT NULL DEFAULT (getdate()),
EventCode nvarchar(5) NOT NULL,
Description nvarchar(300) NULL
) ;
GO

CREATE TABLE Audit.ErrorEvents
(
EventID int PRIMARY KEY CLUSTERED
DEFAULT (NEXT VALUE FOR Audit.EventCounter),
EventTime datetime NOT NULL DEFAULT (getdate()),
EquipmentID int NULL,
ErrorNumber int NOT NULL,
EventDesc nvarchar(256) NULL
) ;
GO

CREATE TABLE Audit.StartStopEvents
(
EventID int PRIMARY KEY CLUSTERED
DEFAULT (NEXT VALUE FOR Audit.EventCounter),
EventTime datetime NOT NULL DEFAULT (getdate()),
EquipmentID int NOT NULL,
StartOrStop bit NOT NULL
) ;
GO

generating a reliable system wide unique identifier

If you want the single row/column approach, I've used:

declare @MyRef int
update CoreTable set @MyRef = LastRef = LastRef + 1

The update will be safe - each person who executes it will receive a distinct result in @MyRef. This is safer than doing separate read, increment, update.


Table defn:

create table CoreTable (
X char(1) not null,
LastRef int not null,
constraint PK_CoreTable PRIMARY KEY (X),
constraint CK_CoreTable_X CHECK (X = 'X')
)
insert into CoreTable (X,LastRef) values ('X',0)

Advice Please: SQL Server Identity vs Unique Identifier keys when using Entity Framework

Another option (not available when this was posted) is to upgrade to EF 4, which supports server-generated GUIDs.

Is uniqueidentifier unique across databases?

If you use the built-in methods of uniqueidentifier generation (like newId() or C#'s Guid.NewGuid()), yes, it will be unique across databases, servers, countries, whatever.

In fact, that's one of the big uses of GUIDs - replication. If you have the same GUID in two databases, it's guaranteed that it was put there on purpose.

However, do note that GUIDs do have their shortcomings - they might make your indices perform worse (or at least require more maintenance), and they are bigger in general.

Also, GUIDs aren't entirely random - there's a few different GUID generating algorithms, some of which are inherently unique (e.g. using a MAC address as part of the GUID - MACs are unique by default, although you can override them manually) - so one part is unique per physical server, and the machine makes sure it doesn't use the same timestamp for two GUIDs. There's also sequential GUIDs (newSequentialId()), which are handy in avoiding index fragmentation (very useful for clustered indices of course) - do note that those depend on the MAC address of the computer, and they are predictable, so if you're making the GUID public, and you depend on them being "secret", you might not want to use those. Some GUID algorithms are more predictable than others.



Related Topics



Leave a reply



Submit