Do link tables need a meaningless primary key field?
For true link tables, they typically do not exist as object entities in my object models. Thus the surrogate key is not ever used. The removable of an item from a collection results in a removal of an item from a link relationship where both foreign keys are known (Person.Siblings.Remove(Sibling)
or Person.RemoveSibling(Sibling)
which is appropriately translated at the data access layer as usp_Person_RemoveSibling(PersonID, SiblingID)
).
As Mike mentioned, if it does become an actual entity in your object model, then it may merit an ID. However, even with addition of temporal factors like effective start and end dates of the relationship and things like that, it's not always clear. For instance, the collection may have an effective date associated at the aggregate level, so the relationship itself may still not become an entity with any exposed properties.
I'd like to add that you might very well need the table indexed both ways on the two foreign key columns.
Rails routing for link tables without primary key?
This is the correct way to do it.
The only thing you should add is in the resources
of buggable_links
add the validate
function as a get method.
What's the best practice for primary keys in tables?
I follow a few rules:
- Primary keys should be as small as necessary. Prefer a numeric type because numeric types are stored in a much more compact format than character formats. This is because most primary keys will be foreign keys in another table as well as used in multiple indexes. The smaller your key, the smaller the index, the less pages in the cache you will use.
- Primary keys should never change. Updating a primary key should always be out of the question. This is because it is most likely to be used in multiple indexes and used as a foreign key. Updating a single primary key could cause of ripple effect of changes.
- Do NOT use "your problem primary key" as your logic model primary key. For example passport number, social security number, or employee contract number as these "natural keys" can change in real world situations. Make sure to add UNIQUE constraints for these where necessary to enforce consistency.
On surrogate vs natural key, I refer to the rules above. If the natural key is small and will never change it can be used as a primary key. If the natural key is large or likely to change I use surrogate keys. If there is no primary key I still make a surrogate key because experience shows you will always add tables to your schema and wish you'd put a primary key in place.
What would be the Primary key?
You would need to define a new column: UserHistoryId
Make it identity column
The reason for this is no combination of the other columns could be unique in all cases.
e.g. If a history record is created twice in one day for a user from the same url.
Performance:
Depends on how the history table is used - If you only ever SELECT data from it by means of a query on userId, or date, or Url, then the ID column would serve no purpose.
However, if you ever perform any Update/Delete operations on the table then, the Id would be useful.
Regardless of current requirements, it costs you almost nothing to include the extra column now, and it's something I would always recommend.
Should I use an index column in a many to many link table?
That depends.
Are you seeing your data more as set of objects (and relational database is just
a storage medium) or as set of facts represented and analyzed natively
by relational algebra.Some ORMs/Frameworks/Tools don't have good support for multicolumn primary keys.
If you happen to use one of them, you'll need additional id column.If it's just a many-to-many relationship with no additional data associated with it,
it's better to avoid additional id column and have both columns as a primary key.If you start adding some additional information to this association, then it may reach a point when it becomes
something more then many-to-many relationship of two entities.
It becomes an entity in it's own right and it'd be more convenient if it had it's own id
independent to entities it connects.
in general, should every table in a database have an identity field to use as a PK?
There are two concepts that are close but should not be confused: IDENTITY
and PRIMARY KEY
Every table (except for the rare conditions) should have a PRIMARY KEY
, that is a value or a set of values that uniquely identify a row.
See here for discussion why.
IDENTITY
is a property of a column in SQL Server
which means that the column will be filled automatically with incrementing values.
Due to the nature of this property, the values of this column are inherently UNIQUE
.
However, no UNIQUE
constraint or UNIQUE
index is automatically created on IDENTITY
column, and after issuing SET IDENTITY_INSERT ON
it's possible to insert duplicate values into an IDENTITY
column, unless it had been explicity UNIQUE
constrained.
The IDENTITY
column should not necessarily be a PRIMARY KEY
, but most often it's used to fill the surrogate PRIMARY KEY
s
It may or may not be useful in any particular case.
Therefore, the answer to your question:
The question: should every table in a database have an IDENTITY field that's used as the PK?
is this:
No. There are cases when a database table should NOT have an IDENTITY
field as a PRIMARY KEY
.
Three cases come into my mind when it's not the best idea to have an IDENTITY
as a PRIMARY KEY
:
- If your
PRIMARY KEY
is composite (like in many-to-many link tables) - If your
PRIMARY KEY
is natural (like, a state code) - If your
PRIMARY KEY
should be unique across databases (in this case you useGUID
/UUID
/NEWID
)
All these cases imply the following condition:
You shouldn't have IDENTITY
when you care for the values of your PRIMARY KEY
and explicitly insert them into your table.
Update:
Many-to-many link tables should have the pair of id
's to the table they link as the composite key.
It's a natural composite key which you already have to use (and make UNIQUE
), so there is no point to generate a surrogate key for this.
I don't see why would you want to reference a many-to-many
link table from any other table except the tables they link, but let's assume you have such a need.
In this case, you just reference the link table by the composite key.
This query:
CREATE TABLE a (id, data)
CREATE TABLE b (id, data)
CREATE TABLE ab (a_id, b_id, PRIMARY KEY (a_id, b_id))
CREATE TABLE business_rule (id, a_id, b_id, FOREIGN KEY (a_id, b_id) REFERENCES ab)
SELECT *
FROM business_rule br
JOIN a
ON a.id = br.a_id
is much more efficient than this one:
CREATE TABLE a (id, data)
CREATE TABLE b (id, data)
CREATE TABLE ab (id, a_id, b_id, PRIMARY KEY (id), UNIQUE KEY (a_id, b_id))
CREATE TABLE business_rule (id, ab_id, FOREIGN KEY (ab_id) REFERENCES ab)
SELECT *
FROM business_rule br
JOIN a_to_b ab
ON br.ab_id = ab.id
JOIN a
ON a.id = ab.a_id
, for obvious reasons.
Adding an artificial primary key versus using a unique field
You are talking about the difference between synthetic and natural keys.
In my [very] personal opinion, I would recommend to always use synthetic keys (and always call it id
). The main problem is that natural keys are never unique; they are unique in theory, yes, but in the real world there are a myriad of unexpected and inexorable events that will make this false.
In database design:
Natural keys correspond to values present in the domain model. For example,
UserName
,SSN
,VIN
can be considered natural keys.Synthetic keys are values not present in the domain model. They are just numeric/string/UUID values that have no relationship with the actual data. They only serve as a unique identifiers for the rows.
I would say, stick to synthetic keys and sleep well at night. You never know what the Marketing Department will come up with on Monday, and suddenly "the username is not unique anymore".
MySQL non primary foreign key
Furthermore the foreign key must/should refer to the primary key. What if I don't know the primary key, but I know another unique column, in this case username, how would I either get the primary key from within another MySQL statement, or alternatively have the foreign key point to a non primary key?
Yes, if you have another unique key, you can have foreign keys referencing it:
CREATE TABLE user
( userid INT NOT NULL
, username VARCHAR(20) NOT NULL
--- other fields
, PRIMARY KEY (userid)
, UNIQUE KEY (username)
) ENGINE = InnoDB ;
CREATE TABLE picture
( pictureid INT NOT NULL
, username VARCHAR(20)
--- other fields
, PRIMARY KEY (pictureid)
, FOREIGN KEY (username)
REFERENCES user(username)
) ENGINE = InnoDB ;
And if all foreign keys in other tables are referencing this Unique Key (username
), there is no point in having a meaningless id. You can drop it and make the username
the PRIMARY KEY
of the table.
(Edit:)
There are a few points having an auto-incrementing primary key for InnoDB tables, even if it is not used as reference because the first Primary or Unique index is made by default the clustering index of the table. A primary char field may have performance drawbacks for INSERT
and UPDATE
statements - but perform better in SELECT
queries.
For a discussion regarding what to use, surrogate (meaningless, auto-generated) or natural keys, and different views on the subject, read this: surrogate-vs-natural-business-keys
Related Topics
MySQL Select Query String Matching
Stratified Random Sampling with Bigquery
SQL Like Statement on a Datetime Type
SQL to Return First Two Columns of a Table
Stored Procedure Return Multiple Result Sets
Conversion to Datetime Fails Only on Where Clause
Ssdt Failing to Publish: "Unable to Connect to Master or Target Server"
Getting All the Children of a Parent Using Mssql Query
How to Use Regular Expression in SQL Server
SQL - Stored Procedure with Select Statement Using in (@Variable_Commadelimitedlistofids)
SQL to Transpose Row Pairs to Columns in Ms Access Database
How to Use Elasticsearch to Get Join Functionality as in SQL
Having Transaction in All Queries