Sequence VS Identity

Sequence vs identity

I think you will find your answer here

Using the identity attribute for a column, you can easily generate
auto-incrementing numbers (which as often used as a primary key). With
Sequence, it will be a different object which you can attach to a
table column while inserting. Unlike identity, the next number for the
column value will be retrieved from memory rather than from the disk –
this makes Sequence significantly faster than Identity. We will see
this in coming examples.

And here:

Sequences: Sequences have been requested by the SQL Server community
for years, and it's included in this release. Sequence is a user
defined object that generates a sequence of a number. Here is an
example using Sequence.

and here as well:

A SQL Server sequence object generates sequence of numbers just like
an identity column in sql tables. But the advantage of sequence
numbers is the sequence number object is not limited with single sql
table.

and on msdn you can also read more about usage and why we need it (here):

A sequence is a user-defined schema-bound object that generates a
sequence of numeric values according to the specification with which
the sequence was created. The sequence of numeric values is generated
in an ascending or descending order at a defined interval and may
cycle (repeat) as requested. Sequences, unlike identity columns, are
not associated with tables. An application refers to a sequence object
to receive its next value. The relationship between sequences and
tables is controlled by the application. User applications can
reference a sequence object and coordinate the values keys across
multiple rows and tables.

A sequence is created independently of the tables by using the CREATE
SEQUENCE statement. Options enable you to control the increment,
maximum and minimum values, starting point, automatic restarting
capability, and caching to improve performance. For information about
the options, see CREATE SEQUENCE.

Unlike identity column values, which are generated when rows are
inserted, an application can obtain the next sequence number before
inserting the row by calling the NEXT VALUE FOR function. The sequence
number is allocated when NEXT VALUE FOR is called even if the number
is never inserted into a table. The NEXT VALUE FOR function can be
used as the default value for a column in a table definition. Use
sp_sequence_get_range to get a range of multiple sequence numbers at
once.

A sequence can be defined as any integer data type. If the data type
is not specified, a sequence defaults to bigint.

Should we use sequences or identities for our primary keys?

Your question is about using sequences versus identity ("generated always as identity" columns, presumably). In Postgres, these would be declared as serial. These would always be some sort of number in a single column.

From the database performance perspective, there is not much difference between the two. One important difference is that some databases will cache identity columns, which can speed inserts but cause gaps. The rules for caching sequences might be different. In a high transaction environment, inadequate caching can be a performance bottleneck. Sharing a single sequence across multiple tables makes this problem worse.

There is a bigger difference from a data management perspective. A sequence requires managing two objects (the table and the sequence). An identity or serial column is built into the table.

For a single table, I have only considered using sequences in databases that do not support built-in auto-increment/serial/identity columns (ahem, "Oracle"). Otherwise, I would use the mechanism designed to work with tables.

I do want to point out that using an auto-incremented surrogate key has other benefits. This should also be the key used for clustering the data, if such a concept exists in the database. New inserts are then always at the "end" (although if you are deleting data, then pages might only be partially used). The primary key should also be the only key used for foreign key references, even if other columns -- in isolation or together -- are unique and candidate primary keys.

Using Identity or sequence in data warehouse

A sequence has more guarantees than an identity column. In particular, each call to a sequence is guaranteed to produce the next value for the sequence.

However, an identity column can have gaps and other inconsistencies. This is all documented here.

Because of the additional guarantees on sequences, I suspect that they are slower. In particular, I suspect that the database cannot preallocate values in batch. That means that in a multi-threaded environments, sequences would impose serialization on transactions, slowing things down.

In general, I see identity used for identifying columns in tables. And although there is probably a performance comparison, I haven't seen one. But I suspect that sequences are a wee bit slower in some circumstances.

Sequences vs Identity columns in Oracle

Either way you're going to be using Sequences. Identity columns in 12c use the same mechanism behind the scenes.

The benefit - the db is managing these objects, and not you.

Sample Image

The feature was introduced to aid in the migration of systems from places like SQL Server and Sybase ASE where these were popular. Much simpler to migrate as is than create a sequence/trigger pair to maintain going forward.

And that benefit extends to regular Oracle customers as well. The flexibility of the IDENTITY clause includes everything you can set when manually defining a sequence.

@GeneratedValue(strategy= IDENTITY ) vs. @GeneratedValue(strategy= SEQUENCE )

Quoting Java Persistence/Identity and Sequencing:

Identity sequencing uses special IDENTITY columns in the database to allow the database to automatically assign an id to the object when its row is inserted. Identity columns are supported in many databases, such as MySQL, DB2, SQL Server, Sybase and Postgres. Oracle does not support IDENTITY columns but they can be simulated through using sequence objects and triggers.

In plain English: you mark at most one ID column in your table as IDENTITY. The database engine will put next available value for you automatically.

And:

Sequence objects use special database objects to generate ids. Sequence objects are only supported in some databases, such as Oracle, DB2, and Postgres. Usually, a SEQUENCE object has a name, an INCREMENT, and other database object settings. Each time the <sequence>.NEXTVAL is selected the sequence is incremented by the INCREMENT.

Sequences are more flexible and slightly more complex. You define an extra object in your database next to tables, triggers, etc. called sequences. Sequences are basically named counter you can use anywhere inside queries.

What is the limit of GENERATED { ALWAYS | BY DEFAULT } AS IDENTITY in PostgreSQL?

Yes, it is dependent on column's data type and could be validated using COLUMNS metadata:

CREATE TABLE t1(id SMALLINT GENERATED ALWAYS AS IDENTITY);
CREATE TABLE t2(id INT GENERATED ALWAYS AS IDENTITY);
CREATE TABLE t3(id BIGINT GENERATED ALWAYS AS IDENTITY);

SELECT table_name, column_name, data_type,
is_identity, identity_minimum, identity_maximum, *
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_NAME IN('t1','t2','t3');

db<>fiddle demo

Output:

Sample Image



Related Topics



Leave a reply



Submit