SQL primary key: integer vs varchar
The primary key is supposed to represent the identity for the row and should not change over time.
I assume that the varchar is some sort of natural key - such as the name of the entity, an email address, or a serial number. If you use a natural key then it can sometimes happen that the key needs to change because for example:
- The data was incorrectly entered and needs to be fixed.
- The user changes their name or email address.
- The management suddenly decide that all customer reference numbers must be changed to another format for reasons that seem completely illogical to you, but they insist on making the change even after you explain the problems it will cause you.
- Maybe even a country or state decides to change the spelling of its name - very unlikely, but not impossible.
By using a surrogate key you avoid problems caused by having to change primary keys.
INT vs VARCHAR datatype for primary keys
INT is faster for clustor index and if we want to join with other table.
You will get idea if you have understanding of Clustor Index and JOIN
Is there a REAL performance difference between INT and VARCHAR primary keys?
You make a good point that you can avoid some number of joined queries by using what's called a natural key instead of a surrogate key. Only you can assess if the benefit of this is significant in your application.
That is, you can measure the queries in your application that are the most important to be speedy, because they work with large volumes of data or they are executed very frequently. If these queries benefit from eliminating a join, and do not suffer by using a varchar primary key, then do it.
Don't use either strategy for all tables in your database. It's likely that in some cases, a natural key is better, but in other cases a surrogate key is better.
Other folks make a good point that it's rare in practice for a natural key to never change or have duplicates, so surrogate keys are usually worthwhile.
Should I design a table with a primary key of varchar or int?
I would definitely recommend using an INT NOT NULL IDENTITY(1,1)
field in each table as the
primary key.
With an IDENTITY field, you can let the database handle all the details of making sure it's really unique and all, and the INT datatype is just 4 bytes, and fixed, so it's easier and more suited to be used for the primary (and clustering) key in your table.
And you're right - INT is an INT is an INT - it will not change its size of anything, so you won't have to ever go recreate and/or update your foreign key relations.
Using a VARCHAR(10) or (20) just uses up too much space - 10 or 20 bytes instead of 4, and what a lot of folks don't know - the clustering key value will be repeated on every single index entry on every single non-clustered index on the table, so potentially, you're wasting a lot of space (not just on disk - that's cheap - but also in SQL Server's main memory). Also, since it's variable (might be 4, might be 20 chars) it's harder to SQL server to properly maintain a good index structure.
Marc
SQL - performance in varchar vs. int
Should I create a new column with a number datatype in both the table and join the table to reduce the time taken by the SQL Query.?
If you're in a position where you can change the design of the database with ease then yes, your Primary Key should be an integer. Unless there is a really good reason to have an FK as a varchar, then they should be integers as well.
If you can't change the PK or FK fields, then make sure they're indexed properly. This will eventually become a bottleneck though.
varchar() primary key or int primary key?
You can keep the primary key as an auto-inc integer column, and then add another column which is a VARCHAR column with a UNIQUE constraint on it.
Using varchar as the primary key? bad idea? or ok?
It totally depends on the data. There are plenty of perfectly legitimate cases where you might use a VARCHAR
primary key, but if there's even the most remote chance that someone might want to update the column in question at some point in the future, don't use it as a key.
Must database primary keys be integers?
You can use varchar
as well as long as you make sure that each one is unique. This however isn't ideal (see article link below for more info).
What you are looking for is called natural key but a primary key with auto-increment and handled by the RDBMS is called surrogate key which is preferred way. Therefore you need to have it to be integer.
Learn more:
- Surrogate Keys vs Natural Keys for Primary Key?
- Why I prefer surrogate keys instead of natural keys in database design
Does a surrogate (INT) key almost always yield better performance than an unique natural (VARCHAR) key (in MySQL)?
I would use the ISBN as the primary key.
Primary key lookups in MySQL's default storage engine InnoDB are more efficient than lookups by secondary index.
It's true an integer takes less storage space than a 24-character varchar, but in your case, I assume you have to store the ISBN anyway. If you could use an integer instead of the ISBN, that would save storage.
The comment above that natural keys tend to violate uniqueness is a good warning in general. The violations usually come from the marketing department. ;-)
But for a given dataset, you can be sure that the natural key is free of duplicates. If you do experience an error reading the ISBN in your library collection, the librarian will have to resolve that manually. But I don't expect that to happen very often for 500,000 books.
Tip: Define the varchar with a binary collation, and it'll be a bit faster to do string comparisons. For example:
CREATE TABLE Books (
isbn varchar(24) COLLATE utf8mb4_bin,
-- ...other columns...
PRIMARY KEY (isbn)
) DEFAULT CHARSET=utf8mb4;
Related Topics
How to Get Other Columns When Using Spark Dataframe Groupby
SQL Server Trigger Insert Values from New Row into Another Table
How to Do an Inner Join on Row Number in SQL Server
Why Postgres Returns Unordered Data in Select Query, After Updation of Row
Can Parameterized Statement Stop All SQL Injection
Mysql, Reshape Data from Long/Tall to Wide
How to Remove Extended Ascii Characters from a String in T-Sql
Postgres Constraint for Unique Datetime Range
Use Access SQL to Do a Grouped Ranking
Alter Table Without Locking the Table
Is There Auto Increment in SQLite
Postgres Not Using Index When Index Scan Is Much Better Option
SQL Server 2005 Row_Number() Without Order By
How to Convert a "Legacy" Left Outer Join Statement in Oracle
Select a Random Sample of Results from a Query Result
Alphanumeric Sorting with Postgresql