SQL Primary Key: Integer VS Varchar

SQL primary key: integer vs varchar

The primary key is supposed to represent the identity for the row and should not change over time.

I assume that the varchar is some sort of natural key - such as the name of the entity, an email address, or a serial number. If you use a natural key then it can sometimes happen that the key needs to change because for example:

  • The data was incorrectly entered and needs to be fixed.
  • The user changes their name or email address.
  • The management suddenly decide that all customer reference numbers must be changed to another format for reasons that seem completely illogical to you, but they insist on making the change even after you explain the problems it will cause you.
  • Maybe even a country or state decides to change the spelling of its name - very unlikely, but not impossible.

By using a surrogate key you avoid problems caused by having to change primary keys.

INT vs VARCHAR datatype for primary keys

INT is faster for clustor index and if we want to join with other table.

You will get idea if you have understanding of Clustor Index and JOIN

Is there a REAL performance difference between INT and VARCHAR primary keys?

You make a good point that you can avoid some number of joined queries by using what's called a natural key instead of a surrogate key. Only you can assess if the benefit of this is significant in your application.

That is, you can measure the queries in your application that are the most important to be speedy, because they work with large volumes of data or they are executed very frequently. If these queries benefit from eliminating a join, and do not suffer by using a varchar primary key, then do it.

Don't use either strategy for all tables in your database. It's likely that in some cases, a natural key is better, but in other cases a surrogate key is better.

Other folks make a good point that it's rare in practice for a natural key to never change or have duplicates, so surrogate keys are usually worthwhile.

Should I design a table with a primary key of varchar or int?

I would definitely recommend using an INT NOT NULL IDENTITY(1,1) field in each table as the
primary key.

With an IDENTITY field, you can let the database handle all the details of making sure it's really unique and all, and the INT datatype is just 4 bytes, and fixed, so it's easier and more suited to be used for the primary (and clustering) key in your table.

And you're right - INT is an INT is an INT - it will not change its size of anything, so you won't have to ever go recreate and/or update your foreign key relations.

Using a VARCHAR(10) or (20) just uses up too much space - 10 or 20 bytes instead of 4, and what a lot of folks don't know - the clustering key value will be repeated on every single index entry on every single non-clustered index on the table, so potentially, you're wasting a lot of space (not just on disk - that's cheap - but also in SQL Server's main memory). Also, since it's variable (might be 4, might be 20 chars) it's harder to SQL server to properly maintain a good index structure.

Marc

SQL - performance in varchar vs. int


Should I create a new column with a number datatype in both the table and join the table to reduce the time taken by the SQL Query.?

If you're in a position where you can change the design of the database with ease then yes, your Primary Key should be an integer. Unless there is a really good reason to have an FK as a varchar, then they should be integers as well.

If you can't change the PK or FK fields, then make sure they're indexed properly. This will eventually become a bottleneck though.

varchar() primary key or int primary key?

You can keep the primary key as an auto-inc integer column, and then add another column which is a VARCHAR column with a UNIQUE constraint on it.

Using varchar as the primary key? bad idea? or ok?

It totally depends on the data. There are plenty of perfectly legitimate cases where you might use a VARCHAR primary key, but if there's even the most remote chance that someone might want to update the column in question at some point in the future, don't use it as a key.

Must database primary keys be integers?

You can use varchar as well as long as you make sure that each one is unique. This however isn't ideal (see article link below for more info).

What you are looking for is called natural key but a primary key with auto-increment and handled by the RDBMS is called surrogate key which is preferred way. Therefore you need to have it to be integer.

Learn more:

  • Surrogate Keys vs Natural Keys for Primary Key?
  • Why I prefer surrogate keys instead of natural keys in database design

Does a surrogate (INT) key almost always yield better performance than an unique natural (VARCHAR) key (in MySQL)?

I would use the ISBN as the primary key.

Primary key lookups in MySQL's default storage engine InnoDB are more efficient than lookups by secondary index.

It's true an integer takes less storage space than a 24-character varchar, but in your case, I assume you have to store the ISBN anyway. If you could use an integer instead of the ISBN, that would save storage.

The comment above that natural keys tend to violate uniqueness is a good warning in general. The violations usually come from the marketing department. ;-)

But for a given dataset, you can be sure that the natural key is free of duplicates. If you do experience an error reading the ISBN in your library collection, the librarian will have to resolve that manually. But I don't expect that to happen very often for 500,000 books.

Tip: Define the varchar with a binary collation, and it'll be a bit faster to do string comparisons. For example:

CREATE TABLE Books (
isbn varchar(24) COLLATE utf8mb4_bin,
-- ...other columns...
PRIMARY KEY (isbn)
) DEFAULT CHARSET=utf8mb4;


Related Topics



Leave a reply



Submit