SQL Server Int or BigInt database table Ids
OK, let's do a quick math recap:
INT is 32-bit and gives you basically 4 billion values - if you only count the values larger than zero, it's still 2 billion. Do you have this many employees? Customers? Products in stock? Orders in the lifetime of your company? REALLY?
BIGINT goes way way way beyond that. Do you REALLY need that?? REALLY?? If you're an astronomer, or into particle physics - maybe. An average Line of Business user? I strongly doubt it
Imagine you have a table with - say - 10 million rows (orders for your company). Let's say, you have an Orders table, and that OrderID which you made a BIGINT is referenced by 5 other tables, and used in 5 non-clustered indices on your Orders table - not overdone, I think, right?
10 million rows, by 5 tables plus 5 non-clustered indices, that's 100 million instances where you are using 8 bytes each instead of 4 bytes - 400 million bytes = 400 MB. A total waste... you'll need more data and index pages, your SQL Server will have to read more pages from disk and cache more pages.... that's not beneficial for your performance - plain and simple.
PLUS: What most programmer's don't think about: yes, disk space it dirt cheap. But that wasted space is also relevant in your SQL Server RAM memory and your database cache - and that space is not dirt cheap!
So to make a very long post short: use the smallest type of INT that really suits your need; if you have 10-20 distinct values to handle - use TINYINT. If you need an order table, I believe INT should be PLENTY ENOUGH - BIGINT is only a waste of space.
Plus: should any of your tables really ever get close to reaching 2 or 4 billion rows, you'll still have plenty of time to upgrade your table to a BIGINT ID, if that's really needed.......
Is it better to use an uniqueidentifier(GUID) or a bigint for an identity column?
That depends on what you're doing:
- If speed is the primary concern then a plain old
int
is probably big enough. - If you really will have more than 2 billion (with a B ;) ) records, then use
bigint
or a sequential guid. - If you need to be able to easily synchronize with records created remotely, then
Guid
is really great.
Update
Some additional (less-obvious) notes on Guids:
- They can be hard on indexes, and that cuts to the core of database performance
- You can use sequential guids to get back some of the indexing performance, but give up some of the randomness used in point two.
- Guids can be hard to debug by hand (
where id='xxx-xxx-xxxxx'
), but you get some of that back via sequential guids as well (where id='xxx-xxx' + '123'
). - For the same reason, Guids can make ID-based security attacks more difficult- but not impossible. (You can't just type
'http://example.com?userid=xxxx'
and expect to get a result for someone else's account).
Replace identity column from int to bigint
Well, it won't be a quick'n'easy way to do this, really....
My approach would be this:
create a new table with identical structure - except for the
ID
column beingBIGINT IDENTITY
instead ofINT IDENTITY
----[ put your server into exclusive single-user mode here; user cannot use your server from this point on ]----
find and disable all foreign key constraints referencing your table
turn
SET IDENTITY_INSERT (your new table) ON
insert the rows from your old table into the new table
turn
SET IDENTITY_INSERT (your new table) OFF
delete your old table
rename your new table to the old table name
update all table that have a FK reference to your table to use
BIGINT
instead ofINT
(that should be doable with a simpleALTER TABLE ..... ALTER COLUMN FKID BIGINT
)re-create all foreign key relationships again
now you can return your server to normal multi-user usage again
INT vs Unique-Identifier for ID field in database
GUIDs are problematic as clustered keys because of the high randomness. This issue was addressed by Paul Randal in the last Technet Magazine Q&A column: I'd like to use a GUID as the clustered index key, but the others are arguing that it can lead to performance issues with indexes. Is this true and, if so, can you explain why?
Now bear in mind that the discussion is specifically about clustered indexes. You say you want to use the column as 'ID', that is unclear if you mean it as clustered key or just primary key. Typically the two overlap, so I'll assume you want to use it as clustered index. The reasons why that is a poor choice are explained in the link to the article I mentioned above.
For non clustered indexes GUIDs still have some issues, but not nearly as big as when they are the leftmost clustered key of the table. Again, the randomness of GUIDs introduces page splits and fragmentation, be it at the non-clustered index level only (a much smaller problem).
There are many urban legends surrounding the GUID usage that condemn them based on their size (16 bytes) compared to an int (4 bytes) and promise horrible performance doom if they are used. This is slightly exaggerated. A key of size 16 can be a very peformant key still, on a properly designed data model. While is true that being 4 times as big as a int results in more a lower density non-leaf pages in indexes, this is not a real concern for the vast majority of tables. The b-tree structure is a naturally well balanced tree and the depth of tree traversal is seldom an issue, so seeking a value based on GUID key as opposed to a INT key is similar in performance. A leaf-page traversal (ie. a table scan) does not look at the non-leaf pages, and the impact of GUID size on the page size is typically quite small, as the record itself is significantly larger than the extra 12 bytes introduced by the GUID. So I'd take the hear-say advice based on 'is 16 bytes vs. 4' with a, rather large, grain of salt. Analyze on individual case by case and decide if the size impact makes a real difference: how many other columns are in the table (ie. how much impact has the GUID size on the leaf pages) and how many references are using it (ie. how many other tables will increase because of the fact they need to store a larger foreign key).
I'm calling out all these details in a sort of makeshift defense of GUIDs because they been getting a lot of bad press lately and some is undeserved. They have their merits and are indispensable in any distributed system (the moment you're talking data movement, be it via replication or sync framework or whatever). I've seen bad decisions being made out based on the GUID bad reputation when they were shun without proper consideration. But is true, if you have to use a GUID as clustered key, make sure you address the randomness issue: use sequential guids when possible.
And finally, to answer your question: if you don't have a specific reason to use GUIDs, use INTs.
Should I use an int or a long for the primary key in an entity framework model
Both are OK. It depends on how many records will be in a table. Int allows only 2*10^9 records per table.
If you are sure, that 2*10^9 is enough, use int as a key.
But:
If there is a tiny chance that count of records will be more than 2*10^9, use the long.
If you don't have any idea how many records you'll have, use long.
Related Topics
Ms SQL Creating Many-To-Many Relation with a Junction Table
Selecting Entries by Date - >= Now(), MySQL
Fifo Implementation in Inventory Using SQL
How to Persist a Variable Across a Go
SQL Bulk Insert with Firstrow Parameter Skips the Following Line
Column Conflicts with the Type of Other Columns in the Unpivot List
SQL Server Filestream Limitation
How to Index a Database Column
Saving Results with Headers in SQL Server Management Studio
SQL Query to Return Only 1 Record Per Group Id
Any References/Manuals on SQL in Excel with Microsoft Ole Db Provider for Jet 4.0
String or Binary Data Would Be Truncated. the Statement Has Been Terminated