SQL Server Int or Bigint Database Table Ids

SQL Server Int or BigInt database table Ids

OK, let's do a quick math recap:

  • INT is 32-bit and gives you basically 4 billion values - if you only count the values larger than zero, it's still 2 billion. Do you have this many employees? Customers? Products in stock? Orders in the lifetime of your company? REALLY?

  • BIGINT goes way way way beyond that. Do you REALLY need that?? REALLY?? If you're an astronomer, or into particle physics - maybe. An average Line of Business user? I strongly doubt it

Imagine you have a table with - say - 10 million rows (orders for your company). Let's say, you have an Orders table, and that OrderID which you made a BIGINT is referenced by 5 other tables, and used in 5 non-clustered indices on your Orders table - not overdone, I think, right?

10 million rows, by 5 tables plus 5 non-clustered indices, that's 100 million instances where you are using 8 bytes each instead of 4 bytes - 400 million bytes = 400 MB. A total waste... you'll need more data and index pages, your SQL Server will have to read more pages from disk and cache more pages.... that's not beneficial for your performance - plain and simple.

PLUS: What most programmer's don't think about: yes, disk space it dirt cheap. But that wasted space is also relevant in your SQL Server RAM memory and your database cache - and that space is not dirt cheap!

So to make a very long post short: use the smallest type of INT that really suits your need; if you have 10-20 distinct values to handle - use TINYINT. If you need an order table, I believe INT should be PLENTY ENOUGH - BIGINT is only a waste of space.

Plus: should any of your tables really ever get close to reaching 2 or 4 billion rows, you'll still have plenty of time to upgrade your table to a BIGINT ID, if that's really needed.......

Is it better to use an uniqueidentifier(GUID) or a bigint for an identity column?

That depends on what you're doing:

  • If speed is the primary concern then a plain old int is probably big enough.
  • If you really will have more than 2 billion (with a B ;) ) records, then use bigint or a sequential guid.
  • If you need to be able to easily synchronize with records created remotely, then Guid is really great.

Update
Some additional (less-obvious) notes on Guids:

  • They can be hard on indexes, and that cuts to the core of database performance
  • You can use sequential guids to get back some of the indexing performance, but give up some of the randomness used in point two.
  • Guids can be hard to debug by hand (where id='xxx-xxx-xxxxx'), but you get some of that back via sequential guids as well (where id='xxx-xxx' + '123').
  • For the same reason, Guids can make ID-based security attacks more difficult- but not impossible. (You can't just type 'http://example.com?userid=xxxx' and expect to get a result for someone else's account).

Replace identity column from int to bigint

Well, it won't be a quick'n'easy way to do this, really....

My approach would be this:

  1. create a new table with identical structure - except for the ID column being BIGINT IDENTITY instead of INT IDENTITY

    ----[ put your server into exclusive single-user mode here; user cannot use your server from this point on ]----

  2. find and disable all foreign key constraints referencing your table

  3. turn SET IDENTITY_INSERT (your new table) ON

  4. insert the rows from your old table into the new table

  5. turn SET IDENTITY_INSERT (your new table) OFF

  6. delete your old table

  7. rename your new table to the old table name

  8. update all table that have a FK reference to your table to use BIGINT instead of INT (that should be doable with a simple ALTER TABLE ..... ALTER COLUMN FKID BIGINT)

  9. re-create all foreign key relationships again

  10. now you can return your server to normal multi-user usage again

INT vs Unique-Identifier for ID field in database

GUIDs are problematic as clustered keys because of the high randomness. This issue was addressed by Paul Randal in the last Technet Magazine Q&A column: I'd like to use a GUID as the clustered index key, but the others are arguing that it can lead to performance issues with indexes. Is this true and, if so, can you explain why?

Now bear in mind that the discussion is specifically about clustered indexes. You say you want to use the column as 'ID', that is unclear if you mean it as clustered key or just primary key. Typically the two overlap, so I'll assume you want to use it as clustered index. The reasons why that is a poor choice are explained in the link to the article I mentioned above.

For non clustered indexes GUIDs still have some issues, but not nearly as big as when they are the leftmost clustered key of the table. Again, the randomness of GUIDs introduces page splits and fragmentation, be it at the non-clustered index level only (a much smaller problem).

There are many urban legends surrounding the GUID usage that condemn them based on their size (16 bytes) compared to an int (4 bytes) and promise horrible performance doom if they are used. This is slightly exaggerated. A key of size 16 can be a very peformant key still, on a properly designed data model. While is true that being 4 times as big as a int results in more a lower density non-leaf pages in indexes, this is not a real concern for the vast majority of tables. The b-tree structure is a naturally well balanced tree and the depth of tree traversal is seldom an issue, so seeking a value based on GUID key as opposed to a INT key is similar in performance. A leaf-page traversal (ie. a table scan) does not look at the non-leaf pages, and the impact of GUID size on the page size is typically quite small, as the record itself is significantly larger than the extra 12 bytes introduced by the GUID. So I'd take the hear-say advice based on 'is 16 bytes vs. 4' with a, rather large, grain of salt. Analyze on individual case by case and decide if the size impact makes a real difference: how many other columns are in the table (ie. how much impact has the GUID size on the leaf pages) and how many references are using it (ie. how many other tables will increase because of the fact they need to store a larger foreign key).

I'm calling out all these details in a sort of makeshift defense of GUIDs because they been getting a lot of bad press lately and some is undeserved. They have their merits and are indispensable in any distributed system (the moment you're talking data movement, be it via replication or sync framework or whatever). I've seen bad decisions being made out based on the GUID bad reputation when they were shun without proper consideration. But is true, if you have to use a GUID as clustered key, make sure you address the randomness issue: use sequential guids when possible.

And finally, to answer your question: if you don't have a specific reason to use GUIDs, use INTs.

Should I use an int or a long for the primary key in an entity framework model

Both are OK. It depends on how many records will be in a table. Int allows only 2*10^9 records per table.

If you are sure, that 2*10^9 is enough, use int as a key.

But:
If there is a tiny chance that count of records will be more than 2*10^9, use the long.
If you don't have any idea how many records you'll have, use long.



Related Topics



Leave a reply



Submit