in general, should every table in a database have an identity field to use as a PK?
There are two concepts that are close but should not be confused: IDENTITY
and PRIMARY KEY
Every table (except for the rare conditions) should have a PRIMARY KEY
, that is a value or a set of values that uniquely identify a row.
See here for discussion why.
IDENTITY
is a property of a column in SQL Server
which means that the column will be filled automatically with incrementing values.
Due to the nature of this property, the values of this column are inherently UNIQUE
.
However, no UNIQUE
constraint or UNIQUE
index is automatically created on IDENTITY
column, and after issuing SET IDENTITY_INSERT ON
it's possible to insert duplicate values into an IDENTITY
column, unless it had been explicity UNIQUE
constrained.
The IDENTITY
column should not necessarily be a PRIMARY KEY
, but most often it's used to fill the surrogate PRIMARY KEY
s
It may or may not be useful in any particular case.
Therefore, the answer to your question:
The question: should every table in a database have an IDENTITY field that's used as the PK?
is this:
No. There are cases when a database table should NOT have an IDENTITY
field as a PRIMARY KEY
.
Three cases come into my mind when it's not the best idea to have an IDENTITY
as a PRIMARY KEY
:
- If your
PRIMARY KEY
is composite (like in many-to-many link tables) - If your
PRIMARY KEY
is natural (like, a state code) - If your
PRIMARY KEY
should be unique across databases (in this case you useGUID
/UUID
/NEWID
)
All these cases imply the following condition:
You shouldn't have IDENTITY
when you care for the values of your PRIMARY KEY
and explicitly insert them into your table.
Update:
Many-to-many link tables should have the pair of id
's to the table they link as the composite key.
It's a natural composite key which you already have to use (and make UNIQUE
), so there is no point to generate a surrogate key for this.
I don't see why would you want to reference a many-to-many
link table from any other table except the tables they link, but let's assume you have such a need.
In this case, you just reference the link table by the composite key.
This query:
CREATE TABLE a (id, data)
CREATE TABLE b (id, data)
CREATE TABLE ab (a_id, b_id, PRIMARY KEY (a_id, b_id))
CREATE TABLE business_rule (id, a_id, b_id, FOREIGN KEY (a_id, b_id) REFERENCES ab)
SELECT *
FROM business_rule br
JOIN a
ON a.id = br.a_id
is much more efficient than this one:
CREATE TABLE a (id, data)
CREATE TABLE b (id, data)
CREATE TABLE ab (id, a_id, b_id, PRIMARY KEY (id), UNIQUE KEY (a_id, b_id))
CREATE TABLE business_rule (id, ab_id, FOREIGN KEY (ab_id) REFERENCES ab)
SELECT *
FROM business_rule br
JOIN a_to_b ab
ON br.ab_id = ab.id
JOIN a
ON a.id = ab.a_id
, for obvious reasons.
Should each and every table have a primary key?
Short answer: yes.
Long answer:
- You need your table to be joinable on something
- If you want your table to be clustered, you need some kind of a primary key.
- If your table design does not need a primary key, rethink your design: most probably, you are missing something. Why keep identical records?
In MySQL, the InnoDB storage engine always creates a primary key if you didn't specify it explicitly, thus making an extra column you don't have access to.
Note that a primary key can be composite.
If you have a many-to-many link table, you create the primary key on all fields involved in the link. Thus you ensure that you don't have two or more records describing one link.
Besides the logical consistency issues, most RDBMS engines will benefit from including these fields in a unique index.
And since any primary key involves creating a unique index, you should declare it and get both logical consistency and performance.
See this article in my blog for why you should always create a unique index on unique data:
- Making an index UNIQUE
P.S. There are some very, very special cases where you don't need a primary key.
Mostly they include log tables which don't have any indexes for performance reasons.
Should all database tables have a primary key?
When you probably WOULD:
In an OLTP database you'd almost always (in my case always) have a primary key of some sort. Sometimes Guid, sometimes autonumber/identity fields, sometimes set by the application or the customer. Sometimes even a combination of more than one field. This is because you'll typically want to uniquely identify any given row from the table.
Also, a primary key is a constraint used by the query optimiser that should improve performance for lookups and joins.
When you probably WOULDN'T:
The only time you wouldn't have a primary key is in a "reporting" table, probably in a denormalised data warehouse.
Should every table have a primary key?
Primary keys can apply to multiple columns! In your example, the primary key should be on both columns, For example (Owner, Friend). Especially when Owner and Friend are foreign keys to a users table rather than actual names say (personally, my identity columns use the "Id" naming convention and so I would have (OwnerId, FriendId)
Personally I believe every table should have a primary key, but you'll find others who disagree.
Here's an article I wrote on the topic of normal forms.
http://michaeljswart.com/2011/01/ridiculously-unnormalized-database-schemas-part-zero/
Should a database table always have primary keys?
You should strive to have a primary key in any non-trivial table where you're likely to want to access (or update or delete) individual records by that key. Primary keys can consist of multiple columns, and formally speaking, will be the shortest available superkey; that is, the shortest available group of columns which, together, uniquely identify any row.
I don't know what the Stack Overflow database schema looks like (and from some of the things I've read on Jeff's blog, I don't want to), but in the situation you describe, it's entirely possible there is a primary key across the post identifier, revision number and tag value; certainly, that would be the shortest (and only) superkey available.
With regards to your second point, while it may be reasonable to argue in favour of aggregating values in archive tables, it does go against the principle that each row/column intersection in a table ought to contain one single value. While it may slightly simplify development, there is no reason you can't keep to a normalised table with versioned metadata, even for something as trivial as tags.
Is ID column required in SQL?
If you really do have some pre-existing column in your data set that already does uniquely identify your row - then no, there's no need for an extra ID column. The primary key however must be unique (in ALL circumstances) and cannot be empty (must be NOT NULL).
In my 20+ years of experience in database design, however, this is almost never truly the case. Most "natural" ID's that appear to be unique aren't - ultimately. US Social Security Numbers aren't guaranteed to be unique, and most other "natural" keys end up being almost unique - and that's just not good enough for a database system.
So if you really do have a proper, unique key in your data already - use it! But most of the time, it's easier and more convenient to have just a single surrogate ID that you can guarantee will be unique over all rows.
Do MySQL tables need an ID?
No you do not need a primary key to make a table work in MySQL. That said, a primary key allows for a unique value to refer to a row in a table from another table, or in any code using the table.
You do need a primary key to make a table work well in MySQL though. Indexes (which the primary key is one of) allow MySQL to search through small, highly optimized subsets of the table to process relationships and searches. In general, any fields that you use in a WHERE
clause or use to link two tables together should be indexed.
Related Topics
Split String into Table Given Row Delimiter and Column Delimiter in SQL Server
Exporting SQL Server Table to Multiple Part Files
Merging Intervals in One Pass in SQL
Avoid String Concatenation to Create Queries
MySQL - Creating Rows VS. Columns Performance
How Is This Script Updating Table When Using Left Joins
Unique Constraint on Combination of Two Columns
How to Select a List of 10,000 Unique Ids from Dual in Oracle SQL
SQL Query of Haversine Formula in SQL Server
Calculate Exact Date Difference in Years Using SQL
Is It Faster to Access Data from Files or a Database Server
SQL Server Clustered Index - Order of Index Question
Calculate Delta(Difference of Current and Previous Row) in SQL
Split Date Range into One Row Per Month in SQL Server