Is Id Column Required in Sql

Is ID column required in SQL?

If you really do have some pre-existing column in your data set that already does uniquely identify your row - then no, there's no need for an extra ID column. The primary key however must be unique (in ALL circumstances) and cannot be empty (must be NOT NULL).

In my 20+ years of experience in database design, however, this is almost never truly the case. Most "natural" ID's that appear to be unique aren't - ultimately. US Social Security Numbers aren't guaranteed to be unique, and most other "natural" keys end up being almost unique - and that's just not good enough for a database system.

So if you really do have a proper, unique key in your data already - use it! But most of the time, it's easier and more convenient to have just a single surrogate ID that you can guarantee will be unique over all rows.

Is ID column always required in SQL?

The performance difference is insignificant.

Advantages to using a numeric id for the tags in your example would be:

to make the intersection table somewhat smaller because integers are smaller on average than a string
to allow changing the spelling of a tag name by updating one row instead of many rows

These may not be important considerations for your case. So no, it's not required to use a numeric id.

I also wrote about this in a chapter titled "ID Required" in my book, SQL Antipatterns Volume 1: Avoiding the Pitfalls of Database Programming.

Is Id column required / recommended in fact table in the given scenario

In your case, I'd probably create a nonclustered primary key on the identity column, to allow for easier FK relationship management and for performance.

The clustered key would be on the date column, to allow for faster range queries. The date column also fulfills the three basic requirements for a clustered index: it's narrow (to make nonclustered indexes smaller), it's stable (because a change on a CI column means reshuffling the NC indexes as well, this is to be avoided) and it's increasing (to avoid bad page splits, the ones not at the end of the table).

WRT non-unique clustered index, SQL Server will add a uniquifier data to it if it's not unique.

Is an ID column always necessary?

CustomerId is the PK, so it is also unique, in that case another unique ID generated by the system is not needed to create a unique row id. You may still want to create it for performance reason: a small datatype can store more index on the same index page, and that mean a faster search time. In that case you can create a CLUSTERED UNIQUE INDEX on the ID row and a NON CLUSTERED PRIMARY KEY on the CustomerID column if the ID have a smaller data type.

The order of the data in the table is a non-problem, never thing that a table is ordered on it's own, it is not. The data is ordered as the result of an ORDER BY clause in the query.

Is it necessary to create id column in SQL table?

No it is not necessary, but for anything short of an association table it is recommended.

This Identity column provides a unique and unchanging Identifier of your data, it makes setting up foreign key relations quite easy.

An association table would not have one of these Identity columns because it has no data itself they generally consist of 2 or more foreign key columms.

in general, should every table in a database have an identity field to use as a PK?

There are two concepts that are close but should not be confused: IDENTITY and PRIMARY KEY

Every table (except for the rare conditions) should have a PRIMARY KEY, that is a value or a set of values that uniquely identify a row.

See here for discussion why.

IDENTITY is a property of a column in SQL Server which means that the column will be filled automatically with incrementing values.

Due to the nature of this property, the values of this column are inherently UNIQUE.

However, no UNIQUE constraint or UNIQUE index is automatically created on IDENTITY column, and after issuing SET IDENTITY_INSERT ON it's possible to insert duplicate values into an IDENTITY column, unless it had been explicity UNIQUE constrained.

The IDENTITY column should not necessarily be a PRIMARY KEY, but most often it's used to fill the surrogate PRIMARY KEYs

It may or may not be useful in any particular case.

Therefore, the answer to your question:

The question: should every table in a database have an IDENTITY field that's used as the PK?

is this:

No. There are cases when a database table should NOT have an `IDENTITY` field as a `PRIMARY KEY`.

Three cases come into my mind when it's not the best idea to have an IDENTITY as a PRIMARY KEY:

If your PRIMARY KEY is composite (like in many-to-many link tables)
If your PRIMARY KEY is natural (like, a state code)
If your PRIMARY KEY should be unique across databases (in this case you use GUID / UUID / NEWID)

All these cases imply the following condition:

You shouldn't have `IDENTITY` when you care for the values of your `PRIMARY KEY` and explicitly insert them into your table.

Update:

Many-to-many link tables should have the pair of id's to the table they link as the composite key.

It's a natural composite key which you already have to use (and make UNIQUE), so there is no point to generate a surrogate key for this.

I don't see why would you want to reference a many-to-many link table from any other table except the tables they link, but let's assume you have such a need.

In this case, you just reference the link table by the composite key.

This query:

CREATE TABLE a (id, data)
CREATE TABLE b (id, data)
CREATE TABLE ab (a_id, b_id, PRIMARY KEY (a_id, b_id))
CREATE TABLE business_rule (id, a_id, b_id, FOREIGN KEY (a_id, b_id) REFERENCES ab)

SELECT  *
FROM    business_rule br
JOIN    a
ON      a.id = br.a_id

is much more efficient than this one:

CREATE TABLE a (id, data)
CREATE TABLE b (id, data)
CREATE TABLE ab (id, a_id, b_id, PRIMARY KEY (id), UNIQUE KEY (a_id, b_id))
CREATE TABLE business_rule (id, ab_id, FOREIGN KEY (ab_id) REFERENCES ab)

SELECT  *
FROM    business_rule br
JOIN    a_to_b ab
ON      br.ab_id = ab.id
JOIN    a
ON      a.id = ab.a_id

, for obvious reasons.

Do SQL tables benefit or need to have a unique ID column?

No, it's not needed, and especially for those many-to-many relationships, it is perfectly acceptible to just not have them.

Those ids are especially useful if you have foreign key relations to that table, but even then, you can have foreign keys that consist of a unique combination of multiple columns, so even for foreign keys you don't strictly need them, although it is very much recommended to use single value keys for this purpose.

The added benefit of having a key you don't need, is that you don't need to add it, once you are going to need it. Hardly an excuse. :)

In case you want to google more info:

Those many to many tables are often called a 'junction table' or 'cross-reference table'.
A 'meaningless' unique ID, often auto-numbered, is also called a 'surrogate key'
A key (including primary keys and foreign keys) that consists of multiple fields, is called a 'compound key'. 'Composite key' is often used as a synonym, although Wikipedia has a slightly different definition.

Is a generic ID column in a SQL table a bad idea?

Managing the references to disparate entities can be really challenging in SQL Server. Postgres, by contrast, supports inheritance which makes this much simpler.

So, my recommendation is to add a notes column to every entity where you want notes. You an add a view to bring all the notes together if you need a view of all the notes.

This has minimal impact on performance or data size. There is no additional overhead for a varchar column, other than the additional NULL bit -- and that is pretty minimal.

Is Id Column Required in Sql