Should I Design a Table with a Primary Key of Varchar or Int

Should I design a table with a primary key of varchar or int?

I would definitely recommend using an INT NOT NULL IDENTITY(1,1) field in each table as the
primary key.

With an IDENTITY field, you can let the database handle all the details of making sure it's really unique and all, and the INT datatype is just 4 bytes, and fixed, so it's easier and more suited to be used for the primary (and clustering) key in your table.

And you're right - INT is an INT is an INT - it will not change its size of anything, so you won't have to ever go recreate and/or update your foreign key relations.

Using a VARCHAR(10) or (20) just uses up too much space - 10 or 20 bytes instead of 4, and what a lot of folks don't know - the clustering key value will be repeated on every single index entry on every single non-clustered index on the table, so potentially, you're wasting a lot of space (not just on disk - that's cheap - but also in SQL Server's main memory). Also, since it's variable (might be 4, might be 20 chars) it's harder to SQL server to properly maintain a good index structure.

Marc

SQL primary key: integer vs varchar

The primary key is supposed to represent the identity for the row and should not change over time.

I assume that the varchar is some sort of natural key - such as the name of the entity, an email address, or a serial number. If you use a natural key then it can sometimes happen that the key needs to change because for example:

  • The data was incorrectly entered and needs to be fixed.
  • The user changes their name or email address.
  • The management suddenly decide that all customer reference numbers must be changed to another format for reasons that seem completely illogical to you, but they insist on making the change even after you explain the problems it will cause you.
  • Maybe even a country or state decides to change the spelling of its name - very unlikely, but not impossible.

By using a surrogate key you avoid problems caused by having to change primary keys.

INT vs VARCHAR datatype for primary keys

INT is faster for clustor index and if we want to join with other table.

You will get idea if you have understanding of Clustor Index and JOIN

Is there a REAL performance difference between INT and VARCHAR primary keys?

You make a good point that you can avoid some number of joined queries by using what's called a natural key instead of a surrogate key. Only you can assess if the benefit of this is significant in your application.

That is, you can measure the queries in your application that are the most important to be speedy, because they work with large volumes of data or they are executed very frequently. If these queries benefit from eliminating a join, and do not suffer by using a varchar primary key, then do it.

Don't use either strategy for all tables in your database. It's likely that in some cases, a natural key is better, but in other cases a surrogate key is better.

Other folks make a good point that it's rare in practice for a natural key to never change or have duplicates, so surrogate keys are usually worthwhile.

Mysql primary key for table with int and varchar field?

Your movie_id could be your only PK and auto-incrementing. Then make a FK movie_id in your alternative alias table to match the alt. name with its original title.

movie_id  |  Title
--------------------
1 | "Jaws"
2 | "Star Trek"
3 | "Matrix 3"


movie_id | Alt_Title
------------------------
1 | "Death Shark"
1 | "Tales of the Deep"
3 | "Neo is Uber"
1 | "Another Jaws Title"

When you make an insert into the alt name table, you will have to make a join on the original title, and pull its movie_id to insert with.

varchar() primary key or int primary key?

You can keep the primary key as an auto-inc integer column, and then add another column which is a VARCHAR column with a UNIQUE constraint on it.

In MySQL does a UNIQUE varchar have to be a PRIMARY KEY as well?

No it does not need to be a primary key.

Additionally: you cannot have more than one 'primary key' (different than compound primary keys), and it frequently makes sense to have an auto-increment id field for convention alone and for some frameworks (e.g. ORM's like Hibernate and Entity Framework).

To answer your second question, there are many business cases where you many enforce a unique constraint on multiple columns without making them your primary key - for example they may have to be unique, but you may need to be able to edit/change them:

i.e.

  1. Email addresses for usernames - forced unique, but users will likely need to update them
  2. Password salts (if your generation is sound this is unlikely to require enforcement)
  3. random strings used to generate one-time or time sensitive links (think bit.ly)

So, the point being this is done all the time.

What's the best practice for primary keys in tables?

I follow a few rules:

  1. Primary keys should be as small as necessary. Prefer a numeric type because numeric types are stored in a much more compact format than character formats. This is because most primary keys will be foreign keys in another table as well as used in multiple indexes. The smaller your key, the smaller the index, the less pages in the cache you will use.
  2. Primary keys should never change. Updating a primary key should always be out of the question. This is because it is most likely to be used in multiple indexes and used as a foreign key. Updating a single primary key could cause of ripple effect of changes.
  3. Do NOT use "your problem primary key" as your logic model primary key. For example passport number, social security number, or employee contract number as these "natural keys" can change in real world situations. Make sure to add UNIQUE constraints for these where necessary to enforce consistency.

On surrogate vs natural key, I refer to the rules above. If the natural key is small and will never change it can be used as a primary key. If the natural key is large or likely to change I use surrogate keys. If there is no primary key I still make a surrogate key because experience shows you will always add tables to your schema and wish you'd put a primary key in place.

Using varchar as the primary key? bad idea? or ok?

It totally depends on the data. There are plenty of perfectly legitimate cases where you might use a VARCHAR primary key, but if there's even the most remote chance that someone might want to update the column in question at some point in the future, don't use it as a key.



Related Topics



Leave a reply



Submit