Sql Primary Key, Int or Guid Or..

Primary Key Type Guid or Int?

There is never a "one solution fits all". You have to carefully design your architecture and select the best options for your scenario. Both INT and GUID types are valid options like they've always been.

You can absolutely use GUID in a URL. In fact, in most scenarios, it is better to use a GUID (or another random ID) in the URL than a sequential numeric ID for security reason. If you use sequential ID, your site visitors will be able to easily guess other users' IDs and potentially access their contents. For example, if my profile URL is /Profiles/111, I can try Profile/112 and see if I can access it. If my reservation URL is Reservation/444, I can try Reservation/441 and see what happens. I can easily guess other IDs in the system. Of course, you must have strong permissions, so I should not be able to see those other pages that don't belong to my account, but if there is any issues or holes in your permissions and security, a breach can happen. While with GUID and other random IDs, there is no way to guess other IDs in the system, so such a breach is much more difficult.

Another issue with sequential IDs is that your users can guess how many accounts or records you have and their order in your database. If my ID is 50269, I know that you must have almost this number of records. If my Id is 4, then I know that you had a very few accounts when I registered. For that reason, many developers start the first ID at some random high number like 1529 instead of 1. It doesn't solve the issue entirely, but it avoid the issues with small IDs. How important all that guessing is depends on the system, so you have to evaluate your scenario carefully.

That's on the top of the benefits mentioned in the article that you mentioned in your question. But still, an integer is better in some areas, so choose the best option for your scenario.

EDIT To answer the point that you raised in your comment about user-friendly URLs. In those scenarios, sequential numbers is the wrong answer. A better solution is a unique string in the URL which is linked to your numeric ID. For example, the Cars movie has this URL on IMDB:

https://www.imdb.com/title/tt0317219/

Now, compare that to the URL of the same movie on Wikipedia, Rotten Tomatoes, Plugged In, or Facebook:


https://en.wikipedia.org/wiki/Cars_(film)
https://www.rottentomatoes.com/m/cars/
https://www.pluggedin.ca/movie-reviews/cars/
https://www.facebook.com/PixarCars

We must agree that those URLs are much friendlier than the one from IMDB.

GUID vs INT IDENTITY

Kimberley Tripp (SQLSkills.com) has an article on using GUID's as primary keys. She advices against it because of the unnecessary overhead.

Is there an efficiency benefit to using INT vs GUID for a primary key?

If you're concerned with performance, then int will beat out the GUID implementation simply because of the size of the data type. Are you comparing to using the uniqueidentifier type or using a char(32)? Either one will be slower than the int implementation.

GUIDs, however, are a good alternative depending on the type of application that you're building. If you are building something that you want to be able to generate a "unique" identifier prior to touching any state machine, then use GUID.

If you're a database guy, you may find yourself leaning towards an int.

If you're an application developer, you may find yourself leaning towards GUID.

Just a quick question, why not use a sequence?

It just depends.

Hope this helps.

SQL primary key, INT or GUID or..?

The advantage of using GUID primkey is that it should be unique in the world, such as whether to move data from one database to another. So you know that the row is unique.

But if we are talking about a small db, so I prefer integer.

Edit:

If you using SQL Server 2005++, can you also use NEWSEQUENTIALID(),
this generates a GUID based on the row above.Allows the index problem with newid() is not there anymore.

What are the best practices for using a GUID as a primary key, specifically regarding performance?

GUIDs may seem to be a natural choice for your primary key - and if you really must, you could probably argue to use it for the PRIMARY KEY of the table. What I'd strongly recommend not to do is use the GUID column as the clustering key, which SQL Server does by default, unless you specifically tell it not to.

You really need to keep two issues apart:

  1. the primary key is a logical construct - one of the candidate keys that uniquely and reliably identifies every row in your table. This can be anything, really - an INT, a GUID, a string - pick what makes most sense for your scenario.

  2. the clustering key (the column or columns that define the "clustered index" on the table) - this is a physical storage-related thing, and here, a small, stable, ever-increasing data type is your best pick - INT or BIGINT as your default option.

By default, the primary key on a SQL Server table is also used as the clustering key - but that doesn't need to be that way! I've personally seen massive performance gains when breaking up the previous GUID-based Primary / Clustered Key into two separate key - the primary (logical) key on the GUID, and the clustering (ordering) key on a separate INT IDENTITY(1,1) column.

As Kimberly Tripp - the Queen of Indexing - and others have stated a great many times - a GUID as the clustering key isn't optimal, since due to its randomness, it will lead to massive page and index fragmentation and to generally bad performance.

Yes, I know - there's newsequentialid() in SQL Server 2005 and up - but even that is not truly and fully sequential and thus also suffers from the same problems as the GUID - just a bit less prominently so.

Then there's another issue to consider: the clustering key on a table will be added to each and every entry on each and every non-clustered index on your table as well - thus you really want to make sure it's as small as possible. Typically, an INT with 2+ billion rows should be sufficient for the vast majority of tables - and compared to a GUID as the clustering key, you can save yourself hundreds of megabytes of storage on disk and in server memory.

Quick calculation - using INT vs. GUID as Primary and Clustering Key:

  • Base Table with 1'000'000 rows (3.8 MB vs. 15.26 MB)
  • 6 nonclustered indexes (22.89 MB vs. 91.55 MB)

TOTAL: 25 MB vs. 106 MB - and that's just on a single table!

Some more food for thought - excellent stuff by Kimberly Tripp - read it, read it again, digest it! It's the SQL Server indexing gospel, really.

  • GUIDs as PRIMARY KEY and/or clustered key
  • The clustered index debate continues
  • Ever-increasing clustering key - the Clustered Index Debate..........again!
  • Disk space is cheap - that's not the point!

PS: of course, if you're dealing with just a few hundred or a few thousand rows - most of these arguments won't really have much of an impact on you. However: if you get into the tens or hundreds of thousands of rows, or you start counting in millions - then those points become very crucial and very important to understand.

Update: if you want to have your PKGUID column as your primary key (but not your clustering key), and another column MYINT (INT IDENTITY) as your clustering key - use this:

CREATE TABLE dbo.MyTable
(PKGUID UNIQUEIDENTIFIER NOT NULL,
MyINT INT IDENTITY(1,1) NOT NULL,
.... add more columns as needed ...... )

ALTER TABLE dbo.MyTable
ADD CONSTRAINT PK_MyTable
PRIMARY KEY NONCLUSTERED (PKGUID)

CREATE UNIQUE CLUSTERED INDEX CIX_MyTable ON dbo.MyTable(MyINT)

Basically: you just have to explicitly tell the PRIMARY KEY constraint that it's NONCLUSTERED (otherwise it's created as your clustered index, by default) - and then you create a second index that's defined as CLUSTERED

This will work - and it's a valid option if you have an existing system that needs to be "re-engineered" for performance. For a new system, if you start from scratch, and you're not in a replication scenario, then I'd always pick ID INT IDENTITY(1,1) as my clustered primary key - much more efficient than anything else!

Asp.net identity: Why the index is on GUID (nvarchar)? Any benefit/drawback to use int?

Lot of discussion has already been made on this issue on the internet. This is really a never ending debate/topic as both int/long and GUID as primary key has pros and cons.

GUID vs INT Debate is a fantastic article on this topic.

You can also take a look on Why is ASP.NET Identity 2.0 using a GUID/string as user id?

Apart from these articles discussion I can propose you few things:

  1. By default SQL Server primary is clustering key, unless you specifically tell it not to, which will cause bad performance in case of GUID primary key. If you take this into consideration then don't choose GUID as primary key.

  2. If you think your primary key has no special meaning without being a unique identifier then choose GUID as Primary Key.

  3. If you need meaningful and readable primary key then choose int/long as primary key.

If you choose GUID as your primary then you can take a look on What are the best practices for using a GUID as a primary key, specifically regarding performance?

EF 4.0 Guid or Int as A primary Key

I would agree 100% with you - using an INT IDENTITY is much better!

GUIDs may seem to be a natural choice for your primary key - and if you really must, you could probably argue to use it for the PRIMARY KEY of the table. What I'd strongly recommend not to do is use the GUID column as the clustering key, which SQL Server does by default, unless you specifically tell it not to.

You really need to keep two issues apart:

1) the primary key is a logical construct - one of the candidate keys that uniquely and reliably identifies every row in your table. This can be anything, really - an INT, a GUID, a string - pick what makes most sense for your scenario.

2) the clustering key (the column or columns that define the "clustered index" on the table) - this is a physical storage-related thing, and here, a small, stable, ever-increasing data type is your best pick - INT or BIGINT as your default option.

By default, the primary key on a SQL Server table is also used as the clustering key - but that doesn't need to be that way! I've personally seen massive performance gains when breaking up the previous GUID-based Primary / Clustered Key into two separate key - the primary (logical) key on the GUID, and the clustering (ordering) key on a separate INT IDENTITY(1,1) column.

As Kimberly Tripp - the Queen of Indexing - and others have stated a great many times - a GUID as the clustering key isn't optimal, since due to its randomness, it will lead to massive page and index fragmentation and to generally bad performance.

Yes, I know - there's newsequentialid() in SQL Server 2005 and up - but even that is not truly and fully sequential and thus also suffers from the same problems as the GUID - just a bit less prominently so.

Then there's another issue to consider: the clustering key on a table will be added to each and every entry on each and every non-clustered index on your table as well - thus you really want to make sure it's as small as possible. Typically, an INT with 2+ billion rows should be sufficient for the vast majority of tables - and compared to a GUID as the clustering key, you can save yourself hundreds of megabytes of storage on disk and in server memory.

Quick calculation - using INT vs. GUID as Primary and Clustering Key:

  • Base Table with 1'000'000 rows (3.8 MB vs. 15.26 MB)
  • 6 nonclustered indexes (22.89 MB vs. 91.55 MB)

TOTAL: 25 MB vs. 106 MB - and that's just on a single table!

Some more food for thought - excellent stuff by Kimberly Tripp - read it, read it again, digest it! It's the SQL Server indexing gospel, really.

  • GUIDs as PRIMARY KEY and/or clustered key
  • The clustered index debate continues
  • Ever-increasing clustering key - the Clustered Index Debate..........again!
  • Disk space is cheap - that's not the point!


Related Topics



Leave a reply



Submit