Is It Ok Not to Use a Primary Key When I Don't Need One

Is it OK not to use a Primary Key When I don't Need one

A primary key uniquely identifies a row in your table.

The fact it's indexed and/or clustered is a physical implementation issue and unrelated to the logical design.

You need one for the table to make sense.

When we don't need a primary key for our table?

No.

The primary key does a lot of stuff behind-the-scenes, even if your application never uses it.

For example: clustering improves efficiency (because heap tables are a mess).

Not to mention, if ANYONE ever has to do something on your table that requires pulling a specific row and you don't have a primary key, you are the bad guy.

When is it okay to not to use PRIMARY KEY?

You should strongly lean towards option two, namely creating a table which relates a playlist ID to a song ID. In this case, you can actually create a primary key which is a composite of the playlist and song ID.

CREATE TABLE playlist_songs (
song_id INT,
playlist_id INT,
PRIMARY KEY (song_id, playlist_id)
)

As to whether you also need an auto increment column on playlist_songs, it would depend on your situation. You might not need it from a business logic point of view since you would likely be manipulating the table using the two columns already there.

Should each and every table have a primary key?

Short answer: yes.

Long answer:

  • You need your table to be joinable on something
  • If you want your table to be clustered, you need some kind of a primary key.
  • If your table design does not need a primary key, rethink your design: most probably, you are missing something. Why keep identical records?

In MySQL, the InnoDB storage engine always creates a primary key if you didn't specify it explicitly, thus making an extra column you don't have access to.

Note that a primary key can be composite.

If you have a many-to-many link table, you create the primary key on all fields involved in the link. Thus you ensure that you don't have two or more records describing one link.

Besides the logical consistency issues, most RDBMS engines will benefit from including these fields in a unique index.

And since any primary key involves creating a unique index, you should declare it and get both logical consistency and performance.

See this article in my blog for why you should always create a unique index on unique data:

  • Making an index UNIQUE

P.S. There are some very, very special cases where you don't need a primary key.

Mostly they include log tables which don't have any indexes for performance reasons.

Is it ever a good idea to not have an 'id' primary key for a table?

In my experience, almost never. (For a "speed matters, I'm just inserting and don't really care about retrieval at this point" style of application, perhaps.)

Whilst you might conceivably never use the ID field, it's nearly always wise to have one happily AUTO_INCREMENTing away, because one day you might need one. (You could of course simply do an 'ALTER..' to add one, but that's besides the point.)

Why is it a bad idea to have a table without a primary key?

This response is mainly opinion/experience-based, so I'll list a few reasons that come to mind. Note that this is not exhaustive.

Here're some reasons why you should use primary keys (PKs):

  1. They allow you to have a way to uniquely identify a given row in a table to ensure that there're no duplicates.
  2. The RDBMS enforces this constraint for you, so you don't have to write additional code to check for duplicates before inserting, avoiding a full table scan, which implies better performance here.
  3. PKs allow you to create foreign keys (FKs) to create relations between tables in a way that the RDBMS is "aware" of them. Without PKs/FKs, the relationship only exists inside the programmer's mind, and the referenced table might have a row with its "PK" deleted, and the other table with the "FK" still thinks the "PK" exists. This is bad, which leads to the next point.
  4. It allows the RDBMS to enforce integrity constraints. Is TableA.id referenced by TableB.table_a_id? If TableB.table_a_id = 5 then, you're guaranteed to have a row with id = 5 in TableA. Data integrity and consistency is maintained, and that is good.
  5. It allows the RDBMS to perform faster searches b/c PK fields are indexed, which means that a table doesn't need to have all of its rows checked when searching for something (e.g. a binary search on a tree structure).

In my opinion, not having a PK might be legal (i.e. the RDBMS will let you), but it's not moral (i.e. you shouldn't do it). I think you'd need to have extraordinarily good/powerful reasons to argue for not using a PK in your DB tables (and I'd still find them debatable), but based on your current level of experience (i.e. you say you're "new to data modeling"), I'd say it's not yet enough to attempt justifying a lack of PKs.

There're more reasons, but I hope this gives you enough to work through it.

As far as your M:M relations go, you need to create a new table, called an associative table, and a composite PK in it, that PK being a combination of the 2 PKs of the other 2 tables.

In other words, if there's a M:M relation between tables A and B, then we create a table C that has a 1:M relation to with both tables A and B. "Graphically", it'd look similar to:

+---+ 1  M +---+ M  1 +---+
| A |------| C |------| B |
+---+ +---+ +---+

With the C table PK somewhat like this:

+-----+
| C |
+-----+
| id | <-- C.id = A.id + B.id (i.e. combined/concatenated, not addition!)
+-----+

or like this:

+-------+
| C |
+-------+
| a_id | <--|
+-------+ +-- composite PK columns instead
| b_id | <--| of concatenation (recommended)
+-------+

Should a database table always have primary keys?

You should strive to have a primary key in any non-trivial table where you're likely to want to access (or update or delete) individual records by that key. Primary keys can consist of multiple columns, and formally speaking, will be the shortest available superkey; that is, the shortest available group of columns which, together, uniquely identify any row.

I don't know what the Stack Overflow database schema looks like (and from some of the things I've read on Jeff's blog, I don't want to), but in the situation you describe, it's entirely possible there is a primary key across the post identifier, revision number and tag value; certainly, that would be the shortest (and only) superkey available.

With regards to your second point, while it may be reasonable to argue in favour of aggregating values in archive tables, it does go against the principle that each row/column intersection in a table ought to contain one single value. While it may slightly simplify development, there is no reason you can't keep to a normalised table with versioned metadata, even for something as trivial as tags.

Is a primary key necessary?

This is a subjective question, so I hope you don't mind me answering with some opinion :)

In the vast majority of tables I've made – I'm talking 95%+ – I've added a primary key, and been glad I did. This is either the most critical unique field in my table (think "social security number") or, more often than not, just an auto-incrementing number that allows me to quickly and easily refer to a field when querying.

This latter use is the most common, and it even has its own name: a "surrogate" or "synthetic" key. This is a value auto-generated by the database and not derived from your application data. If you want to add relations between your tables, this surrogate key is immediately helpful as a foreign key. As someone else answered, these keys are so common that MySQL likes to add one even if you don't, so I'd suggest that means the consensus is very heavily biased towards adding primary keys.

One other thing I like about primary keys is that they help convey your intent to others reading your table schemata and also to your DMBS: "this bit is how I intend to identify my rows uniquely, don't let me try to break that rule!"

To answer your question specifically: no, a primary key is not necessary. But realistically if you intend to store data in the table for any period of time beyond a few minutes, I would very strongly recommend you add one.



Related Topics



Leave a reply



Submit