Primary Key Type Guid or Int?
There is never a "one solution fits all". You have to carefully design your architecture and select the best options for your scenario. Both INT
and GUID
types are valid options like they've always been.
You can absolutely use GUID
in a URL. In fact, in most scenarios, it is better to use a GUID
(or another random ID) in the URL than a sequential numeric ID for security reason. If you use sequential ID, your site visitors will be able to easily guess other users' IDs and potentially access their contents. For example, if my profile URL is /Profiles/111, I can try Profile/112 and see if I can access it. If my reservation URL is Reservation/444, I can try Reservation/441 and see what happens. I can easily guess other IDs in the system. Of course, you must have strong permissions, so I should not be able to see those other pages that don't belong to my account, but if there is any issues or holes in your permissions and security, a breach can happen. While with GUID
and other random IDs, there is no way to guess other IDs in the system, so such a breach is much more difficult.
Another issue with sequential IDs is that your users can guess how many accounts or records you have and their order in your database. If my ID is 50269, I know that you must have almost this number of records. If my Id is 4, then I know that you had a very few accounts when I registered. For that reason, many developers start the first ID at some random high number like 1529 instead of 1. It doesn't solve the issue entirely, but it avoid the issues with small IDs. How important all that guessing is depends on the system, so you have to evaluate your scenario carefully.
That's on the top of the benefits mentioned in the article that you mentioned in your question. But still, an integer is better in some areas, so choose the best option for your scenario.
EDIT To answer the point that you raised in your comment about user-friendly URLs. In those scenarios, sequential numbers is the wrong answer. A better solution is a unique string in the URL which is linked to your numeric ID. For example, the Cars movie has this URL on IMDB:
https://www.imdb.com/title/tt0317219/
Now, compare that to the URL of the same movie on Wikipedia, Rotten Tomatoes, Plugged In, or Facebook:
https://en.wikipedia.org/wiki/Cars_(film)
https://www.rottentomatoes.com/m/cars/
https://www.pluggedin.ca/movie-reviews/cars/
https://www.facebook.com/PixarCars
We must agree that those URLs are much friendlier than the one from IMDB.
GUID vs INT IDENTITY
Kimberley Tripp (SQLSkills.com) has an article on using GUID's as primary keys. She advices against it because of the unnecessary overhead.
Is there an efficiency benefit to using INT vs GUID for a primary key?
If you're concerned with performance, then int
will beat out the GUID implementation simply because of the size of the data type. Are you comparing to using the uniqueidentifier
type or using a char(32)
? Either one will be slower than the int
implementation.
GUIDs, however, are a good alternative depending on the type of application that you're building. If you are building something that you want to be able to generate a "unique" identifier prior to touching any state machine, then use GUID.
If you're a database guy, you may find yourself leaning towards an int.
If you're an application developer, you may find yourself leaning towards GUID.
Just a quick question, why not use a sequence
?
It just depends.
Hope this helps.
SQL primary key, INT or GUID or..?
The advantage of using GUID
primkey is that it should be unique in the world, such as whether to move data from one database to another. So you know that the row is unique.
But if we are talking about a small db, so I prefer integer.
Edit:
If you using SQL Server 2005++, can you also use NEWSEQUENTIALID(),
this generates a GUID based on the row above.Allows the index problem with newid() is not there anymore.
What are the best practices for using a GUID as a primary key, specifically regarding performance?
GUIDs may seem to be a natural choice for your primary key - and if you really must, you could probably argue to use it for the PRIMARY KEY of the table. What I'd strongly recommend not to do is use the GUID column as the clustering key, which SQL Server does by default, unless you specifically tell it not to.
You really need to keep two issues apart:
the primary key is a logical construct - one of the candidate keys that uniquely and reliably identifies every row in your table. This can be anything, really - an
INT
, aGUID
, a string - pick what makes most sense for your scenario.the clustering key (the column or columns that define the "clustered index" on the table) - this is a physical storage-related thing, and here, a small, stable, ever-increasing data type is your best pick -
INT
orBIGINT
as your default option.
By default, the primary key on a SQL Server table is also used as the clustering key - but that doesn't need to be that way! I've personally seen massive performance gains when breaking up the previous GUID-based Primary / Clustered Key into two separate key - the primary (logical) key on the GUID, and the clustering (ordering) key on a separate INT IDENTITY(1,1)
column.
As Kimberly Tripp - the Queen of Indexing - and others have stated a great many times - a GUID
as the clustering key isn't optimal, since due to its randomness, it will lead to massive page and index fragmentation and to generally bad performance.
Yes, I know - there's newsequentialid()
in SQL Server 2005 and up - but even that is not truly and fully sequential and thus also suffers from the same problems as the GUID
- just a bit less prominently so.
Then there's another issue to consider: the clustering key on a table will be added to each and every entry on each and every non-clustered index on your table as well - thus you really want to make sure it's as small as possible. Typically, an INT
with 2+ billion rows should be sufficient for the vast majority of tables - and compared to a GUID
as the clustering key, you can save yourself hundreds of megabytes of storage on disk and in server memory.
Quick calculation - using INT
vs. GUID
as Primary and Clustering Key:
- Base Table with 1'000'000 rows (3.8 MB vs. 15.26 MB)
- 6 nonclustered indexes (22.89 MB vs. 91.55 MB)
TOTAL: 25 MB vs. 106 MB - and that's just on a single table!
Some more food for thought - excellent stuff by Kimberly Tripp - read it, read it again, digest it! It's the SQL Server indexing gospel, really.
- GUIDs as PRIMARY KEY and/or clustered key
- The clustered index debate continues
- Ever-increasing clustering key - the Clustered Index Debate..........again!
- Disk space is cheap - that's not the point!
PS: of course, if you're dealing with just a few hundred or a few thousand rows - most of these arguments won't really have much of an impact on you. However: if you get into the tens or hundreds of thousands of rows, or you start counting in millions - then those points become very crucial and very important to understand.
Update: if you want to have your PKGUID
column as your primary key (but not your clustering key), and another column MYINT
(INT IDENTITY
) as your clustering key - use this:
CREATE TABLE dbo.MyTable
(PKGUID UNIQUEIDENTIFIER NOT NULL,
MyINT INT IDENTITY(1,1) NOT NULL,
.... add more columns as needed ...... )
ALTER TABLE dbo.MyTable
ADD CONSTRAINT PK_MyTable
PRIMARY KEY NONCLUSTERED (PKGUID)
CREATE UNIQUE CLUSTERED INDEX CIX_MyTable ON dbo.MyTable(MyINT)
Basically: you just have to explicitly tell the PRIMARY KEY
constraint that it's NONCLUSTERED
(otherwise it's created as your clustered index, by default) - and then you create a second index that's defined as CLUSTERED
This will work - and it's a valid option if you have an existing system that needs to be "re-engineered" for performance. For a new system, if you start from scratch, and you're not in a replication scenario, then I'd always pick ID INT IDENTITY(1,1)
as my clustered primary key - much more efficient than anything else!
Asp.net identity: Why the index is on GUID (nvarchar)? Any benefit/drawback to use int?
Lot of discussion has already been made on this issue on the internet. This is really a never ending debate/topic as both int/long
and GUID
as primary key has pros and cons.
GUID vs INT Debate is a fantastic article on this topic.
You can also take a look on Why is ASP.NET Identity 2.0 using a GUID/string as user id?
Apart from these articles discussion I can propose you few things:
By default SQL Server primary is clustering key, unless you specifically tell it not to, which will cause bad performance in case of
GUID
primary key. If you take this into consideration then don't chooseGUID
as primary key.If you think your primary key has no special meaning without being a unique identifier then choose
GUID
as Primary Key.If you need meaningful and readable primary key then choose
int/long
as primary key.
If you choose GUID
as your primary then you can take a look on What are the best practices for using a GUID as a primary key, specifically regarding performance?
EF 4.0 Guid or Int as A primary Key
I would agree 100% with you - using an INT IDENTITY
is much better!
GUIDs may seem to be a natural choice for your primary key - and if you really must, you could probably argue to use it for the PRIMARY KEY of the table. What I'd strongly recommend not to do is use the GUID column as the clustering key, which SQL Server does by default, unless you specifically tell it not to.
You really need to keep two issues apart:
1) the primary key is a logical construct - one of the candidate keys that uniquely and reliably identifies every row in your table. This can be anything, really - an INT, a GUID, a string - pick what makes most sense for your scenario.
2) the clustering key (the column or columns that define the "clustered index" on the table) - this is a physical storage-related thing, and here, a small, stable, ever-increasing data type is your best pick - INT or BIGINT as your default option.
By default, the primary key on a SQL Server table is also used as the clustering key - but that doesn't need to be that way! I've personally seen massive performance gains when breaking up the previous GUID-based Primary / Clustered Key into two separate key - the primary (logical) key on the GUID, and the clustering (ordering) key on a separate INT IDENTITY(1,1) column.
As Kimberly Tripp - the Queen of Indexing - and others have stated a great many times - a GUID as the clustering key isn't optimal, since due to its randomness, it will lead to massive page and index fragmentation and to generally bad performance.
Yes, I know - there's newsequentialid()
in SQL Server 2005 and up - but even that is not truly and fully sequential and thus also suffers from the same problems as the GUID - just a bit less prominently so.
Then there's another issue to consider: the clustering key on a table will be added to each and every entry on each and every non-clustered index on your table as well - thus you really want to make sure it's as small as possible. Typically, an INT with 2+ billion rows should be sufficient for the vast majority of tables - and compared to a GUID as the clustering key, you can save yourself hundreds of megabytes of storage on disk and in server memory.
Quick calculation - using INT vs. GUID as Primary and Clustering Key:
- Base Table with 1'000'000 rows (3.8 MB vs. 15.26 MB)
- 6 nonclustered indexes (22.89 MB vs. 91.55 MB)
TOTAL: 25 MB vs. 106 MB - and that's just on a single table!
Some more food for thought - excellent stuff by Kimberly Tripp - read it, read it again, digest it! It's the SQL Server indexing gospel, really.
- GUIDs as PRIMARY KEY and/or clustered key
- The clustered index debate continues
- Ever-increasing clustering key - the Clustered Index Debate..........again!
- Disk space is cheap - that's not the point!
Related Topics
Change Data Type Varchar to Varbinary(Max) in SQL Server
Query for Comma-Separated Ids to Comma-Separated Values
Sql - Select Max() and Accompanying Field
Haversine Formula Using SQL Server to Find Closest Venue - VB.NET
Oracle Analytic Functions - Resetting a Windowing Clause
Split Multiple Columns into Multiple Rows
How to Search New Line Char in Oracle Table
Insert Large Amount of Data Efficiently with Sql
Distinct() Function (Not Select Qualifier) in Postgres
Storing Single Quotes in Varchar Variable SQL Server 2008
Order by a Field Being Equal to a Specific Value
Difference Between 'Load Data Inpath ' and 'Location' in Hive
Drop Foreign Keys Generally in Postgres
Sql: Insert a Linebreak in Varchar String
Need to Convert Text Field to Varchar Temporarily So That I Can Pass to a Stored Procedure