Do Link Tables Need a Meaningless Primary Key Field

Do link tables need a meaningless primary key field?

For true link tables, they typically do not exist as object entities in my object models. Thus the surrogate key is not ever used. The removable of an item from a collection results in a removal of an item from a link relationship where both foreign keys are known (Person.Siblings.Remove(Sibling) or Person.RemoveSibling(Sibling) which is appropriately translated at the data access layer as usp_Person_RemoveSibling(PersonID, SiblingID)).

As Mike mentioned, if it does become an actual entity in your object model, then it may merit an ID. However, even with addition of temporal factors like effective start and end dates of the relationship and things like that, it's not always clear. For instance, the collection may have an effective date associated at the aggregate level, so the relationship itself may still not become an entity with any exposed properties.

I'd like to add that you might very well need the table indexed both ways on the two foreign key columns.

Rails routing for link tables without primary key?

This is the correct way to do it.

The only thing you should add is in the resources of buggable_links add the validate function as a get method.

What's the best practice for primary keys in tables?

I follow a few rules:

Primary keys should be as small as necessary. Prefer a numeric type because numeric types are stored in a much more compact format than character formats. This is because most primary keys will be foreign keys in another table as well as used in multiple indexes. The smaller your key, the smaller the index, the less pages in the cache you will use.
Primary keys should never change. Updating a primary key should always be out of the question. This is because it is most likely to be used in multiple indexes and used as a foreign key. Updating a single primary key could cause of ripple effect of changes.
Do NOT use "your problem primary key" as your logic model primary key. For example passport number, social security number, or employee contract number as these "natural keys" can change in real world situations. Make sure to add UNIQUE constraints for these where necessary to enforce consistency.

On surrogate vs natural key, I refer to the rules above. If the natural key is small and will never change it can be used as a primary key. If the natural key is large or likely to change I use surrogate keys. If there is no primary key I still make a surrogate key because experience shows you will always add tables to your schema and wish you'd put a primary key in place.

What would be the Primary key?

You would need to define a new column: UserHistoryId
Make it identity column

The reason for this is no combination of the other columns could be unique in all cases.

e.g. If a history record is created twice in one day for a user from the same url.

Performance:

Depends on how the history table is used - If you only ever SELECT data from it by means of a query on userId, or date, or Url, then the ID column would serve no purpose.

However, if you ever perform any Update/Delete operations on the table then, the Id would be useful.

Regardless of current requirements, it costs you almost nothing to include the extra column now, and it's something I would always recommend.

Should I use an index column in a many to many link table?

That depends.

Are you seeing your data more as set of objects (and relational database is just
a storage medium) or as set of facts represented and analyzed natively
by relational algebra.
Some ORMs/Frameworks/Tools don't have good support for multicolumn primary keys.
If you happen to use one of them, you'll need additional id column.
If it's just a many-to-many relationship with no additional data associated with it,
it's better to avoid additional id column and have both columns as a primary key.
If you start adding some additional information to this association, then it may reach a point when it becomes
something more then many-to-many relationship of two entities.
It becomes an entity in it's own right and it'd be more convenient if it had it's own id
independent to entities it connects.

in general, should every table in a database have an identity field to use as a PK?

There are two concepts that are close but should not be confused: IDENTITY and PRIMARY KEY

Every table (except for the rare conditions) should have a PRIMARY KEY, that is a value or a set of values that uniquely identify a row.

See here for discussion why.

IDENTITY is a property of a column in SQL Server which means that the column will be filled automatically with incrementing values.

Due to the nature of this property, the values of this column are inherently UNIQUE.

However, no UNIQUE constraint or UNIQUE index is automatically created on IDENTITY column, and after issuing SET IDENTITY_INSERT ON it's possible to insert duplicate values into an IDENTITY column, unless it had been explicity UNIQUE constrained.

The IDENTITY column should not necessarily be a PRIMARY KEY, but most often it's used to fill the surrogate PRIMARY KEYs

It may or may not be useful in any particular case.

Therefore, the answer to your question:

The question: should every table in a database have an IDENTITY field that's used as the PK?

is this:

No. There are cases when a database table should NOT have an `IDENTITY` field as a `PRIMARY KEY`.

Three cases come into my mind when it's not the best idea to have an IDENTITY as a PRIMARY KEY:

If your PRIMARY KEY is composite (like in many-to-many link tables)
If your PRIMARY KEY is natural (like, a state code)
If your PRIMARY KEY should be unique across databases (in this case you use GUID / UUID / NEWID)

All these cases imply the following condition:

You shouldn't have `IDENTITY` when you care for the values of your `PRIMARY KEY` and explicitly insert them into your table.

Update:

Many-to-many link tables should have the pair of id's to the table they link as the composite key.

It's a natural composite key which you already have to use (and make UNIQUE), so there is no point to generate a surrogate key for this.

I don't see why would you want to reference a many-to-many link table from any other table except the tables they link, but let's assume you have such a need.

In this case, you just reference the link table by the composite key.

This query:

CREATE TABLE a (id, data)
CREATE TABLE b (id, data)
CREATE TABLE ab (a_id, b_id, PRIMARY KEY (a_id, b_id))
CREATE TABLE business_rule (id, a_id, b_id, FOREIGN KEY (a_id, b_id) REFERENCES ab)

SELECT  *
FROM    business_rule br
JOIN    a
ON      a.id = br.a_id

is much more efficient than this one:

CREATE TABLE a (id, data)
CREATE TABLE b (id, data)
CREATE TABLE ab (id, a_id, b_id, PRIMARY KEY (id), UNIQUE KEY (a_id, b_id))
CREATE TABLE business_rule (id, ab_id, FOREIGN KEY (ab_id) REFERENCES ab)

SELECT  *
FROM    business_rule br
JOIN    a_to_b ab
ON      br.ab_id = ab.id
JOIN    a
ON      a.id = ab.a_id

, for obvious reasons.

Adding an artificial primary key versus using a unique field

You are talking about the difference between synthetic and natural keys.

In my [very] personal opinion, I would recommend to always use synthetic keys (and always call it id). The main problem is that natural keys are never unique; they are unique in theory, yes, but in the real world there are a myriad of unexpected and inexorable events that will make this false.

In database design:

Natural keys correspond to values present in the domain model. For example, UserName, SSN, VIN can be considered natural keys.
Synthetic keys are values not present in the domain model. They are just numeric/string/UUID values that have no relationship with the actual data. They only serve as a unique identifiers for the rows.

I would say, stick to synthetic keys and sleep well at night. You never know what the Marketing Department will come up with on Monday, and suddenly "the username is not unique anymore".

MySQL non primary foreign key

Furthermore the foreign key must/should refer to the primary key. What if I don't know the primary key, but I know another unique column, in this case username, how would I either get the primary key from within another MySQL statement, or alternatively have the foreign key point to a non primary key?

Yes, if you have another unique key, you can have foreign keys referencing it:

CREATE TABLE user
( userid INT NOT NULL 
, username VARCHAR(20) NOT NULL
---  other fields
, PRIMARY KEY (userid)
, UNIQUE KEY (username)
) ENGINE = InnoDB ;

CREATE TABLE picture
( pictureid INT NOT NULL 
, username VARCHAR(20) 
---  other fields
, PRIMARY KEY (pictureid)
, FOREIGN KEY (username)
    REFERENCES user(username)
) ENGINE = InnoDB ;

And if all foreign keys in other tables are referencing this Unique Key (username), there is no point in having a meaningless id. You can drop it and make the username the PRIMARY KEY of the table.

(Edit:)
There are a few points having an auto-incrementing primary key for InnoDB tables, even if it is not used as reference because the first Primary or Unique index is made by default the clustering index of the table. A primary char field may have performance drawbacks for INSERT and UPDATE statements - but perform better in SELECT queries.

For a discussion regarding what to use, surrogate (meaningless, auto-generated) or natural keys, and different views on the subject, read this: surrogate-vs-natural-business-keys

Do Link Tables Need a Meaningless Primary Key Field