Postgresql Generate Sequence with No Gap

PostgreSQL gapless sequences

Sequences have gaps to permit concurrent inserts. Attempting to avoid gaps or to re-use deleted IDs creates horrible performance problems. See the PostgreSQL wiki FAQ.

PostgreSQL SEQUENCEs are used to allocate IDs. These only ever increase, and they're exempt from the usual transaction rollback rules to permit multiple transactions to grab new IDs at the same time. This means that if a transaction rolls back, those IDs are "thrown away"; there's no list of "free" IDs kept, just the current ID counter. Sequences are also usually incremented if the database shuts down uncleanly.

Synthetic keys (IDs) are meaningless anyway. Their order is not significant, their only property of significance is uniqueness. You can't meaningfully measure how "far apart" two IDs are, nor can you meaningfully say if one is greater or less than another. All you can do is say "equal" or "not equal". Anything else is unsafe. You shouldn't care about gaps.

If you need a gapless sequence that re-uses deleted IDs, you can have one, you just have to give up a huge amount of performance for it - in particular, you cannot have any concurrency on INSERTs at all, because you have to scan the table for the lowest free ID, locking the table for write so no other transaction can claim the same ID. Try searching for "postgresql gapless sequence".

The simplest approach is to use a counter table and a function that gets the next ID. Here's a generalized version that uses a counter table to generate consecutive gapless IDs; it doesn't re-use IDs, though.

CREATE TABLE thetable_id_counter ( last_id integer not null );
INSERT INTO thetable_id_counter VALUES (0);

CREATE OR REPLACE FUNCTION get_next_id(countertable regclass, countercolumn text) RETURNS integer AS $$
DECLARE
next_value integer;
BEGIN
EXECUTE format('UPDATE %s SET %I = %I + 1 RETURNING %I', countertable, countercolumn, countercolumn, countercolumn) INTO next_value;
RETURN next_value;
END;
$$ LANGUAGE plpgsql;

COMMENT ON get_next_id(countername regclass) IS 'Increment and return value from integer column $2 in table $1';

Usage:

INSERT INTO dummy(id, blah) 
VALUES ( get_next_id('thetable_id_counter','last_id'), 42 );

Note that when one open transaction has obtained an ID, all other transactions that try to call get_next_id will block until the 1st transaction commits or rolls back. This is unavoidable and for gapless IDs and is by design.

If you want to store multiple counters for different purposes in a table, just add a parameter to the above function, add a column to the counter table, and add a WHERE clause to the UPDATE that matches the parameter to the added column. That way you can have multiple independently-locked counter rows. Do not just add extra columns for new counters.

This function does not re-use deleted IDs, it just avoids introducing gaps.

To re-use IDs I advise ... not re-using IDs.

If you really must, you can do so by adding an ON INSERT OR UPDATE OR DELETE trigger on the table of interest that adds deleted IDs to a free-list side table, and removes them from the free-list table when they're INSERTed. Treat an UPDATE as a DELETE followed by an INSERT. Now modify the ID generation function above so that it does a SELECT free_id INTO next_value FROM free_ids FOR UPDATE LIMIT 1 and if found, DELETEs that row. IF NOT FOUND gets a new ID from the generator table as normal. Here's an untested extension of the prior function to support re-use:

CREATE OR REPLACE FUNCTION get_next_id_reuse(countertable regclass, countercolumn text, freelisttable regclass, freelistcolumn text) RETURNS integer AS $$
DECLARE
next_value integer;
BEGIN
EXECUTE format('SELECT %I FROM %s FOR UPDATE LIMIT 1', freelistcolumn, freelisttable) INTO next_value;
IF next_value IS NOT NULL THEN
EXECUTE format('DELETE FROM %s WHERE %I = %L', freelisttable, freelistcolumn, next_value);
ELSE
EXECUTE format('UPDATE %s SET %I = %I + 1 RETURNING %I', countertable, countercolumn, countercolumn, countercolumn) INTO next_value;
END IF;
RETURN next_value;
END;
$$ LANGUAGE plpgsql;

PostgreSQL Sequence: How to always increment when called (no matter what the result of a select statement is)?

Use a simple function (don't know if with is parsable with views) to first increment the sequence explicitly, then perform then perform the select.

--drop function test_metrics_func()
create or replace function test_metrics_func() returns
table(
processing_job_id integer,
name text,
seq_id bigint
)
as $$
/*
explicitly increment the seq, and keep the update value cached
*/
with seq_inc as (select nextval('next_job_id') as seq)
--use the cached seq, instead of seq here
SELECT t.processing_job_id, t.name, s.seq AS sequence_id
FROM test_table t, seq_inc s
WHERE t.processing_job_id = s.seq
$$ language sql;

Note: Since seq-inc is explicitly called first, so the init value should be set to 29999, not 30000 in your case.

Testing

--resetting the seq to 29999 (important), or alpha will be ignored
select setval('public.next_job_id', 29999, true);
--calls
select * from test_metrics_func() ---> 30000 "alpha" 30000
select * from test_metrics_func() ---> 30001 "bravo" 30001
select * from test_metrics_func() ---> nothing
select * from test_metrics_func() ---> 30003 "belta" 30003
select * from test_metrics_func() ---> 30004 "echo" 30004
select * from test_metrics_func() ---> nothing

NOTE: am not sure if the func(nextval) is thread-safe or not, and how does pg is going to deal with it.

Generate gap free numbers with database trigger


Introduction

I believe the most crucial point here is:

  • there should be no gaps between invoice numbers

In this case you cannot use a squence and an auto-increment field (as others propose in the comments). Auto-increment field use sequence under the hood and nextval(regclass) function increments sequence's counter no matter if transaction succeeded or failed (you point that out by yourself).


Update:

What I mean is you shouldn't use sequences at all, especially solution proposed by you doesn't eliminates gap possibility. Your trigger gets new sequence value but INSERT could still failed.


Sequences works this way because they mainly meant to be used for PRIMARY KEYs and OIDs values generation where uniqueness and non-blocking mechanism is ultimate goal and gaps between values are really no big deal.

In your case however the priorities may be different, but there are couple things to consider.

Simple solution

First possible solution to your problem could be returning new number as maximum value of currently existing ones. It can be done in your trigger:

NEW.invoice_number =
(SELECT foo.invoice_number
FROM invoices foo
WHERE foo._year = NEW._year
ORDER BY foo.invoice_number DESC NULLS LAST LIMIT 1
); /*query 1*/

This query could use your composite UNIQUE INDEX if it was created with "proper" syntax and columns order which would be the "year" column in the first place ex.:

CREATE UNIQUE INDEX invoice_number_unique
ON invoices (_year, invoice_number DESC NULLS LAST);

In PostgreSQL UNIQUE CONSTRAINTs are implemented simply as UNIQUE INDEXes so most of the times there no difference which command you will use. However using that particular syntax presented above, makes possible to define order on that index. It's really nice trick which makes /*query 1*/ quicker than simple SELECT max(invoice_number) FROM invoices WHERE _year = NEW.year if the invoice table gets bigger.

This is simple solution but has one big drawback. There is possibility of race condition when two transactions try to insert invoice at the same time. Both could acquire the same max value and the UNIQUE CONSTRAINT will prevent the second one from committing. Despite that it could be sufficient in some small system with special insert policy.

Better solution

You may create table

CREATE TABLE invoice_numbers(
_year INTEGER NOT NULL PRIMARY KEY,
next_number_within_year INTEGER
);

to store next possible number for certain year. Then, in AFTER INSERT trigger you could:

  1. Lock invoice_numbers that no other transaction could even read the number LOCK TABLE invoice_numbers IN ACCESS EXCLUSIVE;
  2. Get new invoice number new_invoice_number = (SELECT foo.next_number_within_year FROM invoice_numbers foo where foo._year = NEW.year);
  3. Update number value of new added invoice row
  4. Increment UPDATE invoice_numbers SET next_number_within_year = next_number_within_year + 1 WHERE _year = NEW._year;

Because table lock is hold by the transaction to its commit, this probably should be the last trigger fired (read more about trigger execution order here)


Update:

Instead of locking whole table with LOCK command check link provided by Craig Ringer


The drawback in this case is INSERT operation performance drop down --- only one transaction at the time can perform insert.

Sequential increment skipping numbers

serial columns, or IDENTITY in Postgres 10 or later, draw numbers from a SEQUENCE and gaps are to be expected. Their job is to make concurrent write access possible with unique numbers - not necessarily gap-less numbers.

If you don't actually have concurrent write access, there are simple ways to achieve (mostly) gap-less numbers. Like:

INSERT INTO tbl (info) 
SELECT 'xxx'
WHERE NOT EXISTS (SELECT FROM tbl WHERE info = 'xxx');

That doesn't burn a serial ID from the SEQUENCE because a duplicate insert is skipped. (The INSERT might still fail for any other reason - and burn a serial number. You could reset the SEQUENCE in such a case:

  • How to reset postgres' primary key sequence when it falls out of sync?

While inserting multiple rows in a single statement, you also have to rule out duplicates within the inserted set. Example code:

  • Return data from subselect used in INSERT in a Common Table Expression

But if you do have concurrent writes, none of the above works reliably, on principle. You better learn to accept gaps in the IDs. You can always have a query with row_number() OVER (ORDER BY id) to generate gap-less numbers after the fact. However, the numbers are still arbitrary to a degree. Smaller numbers were not necessarily committed earlier. There are exceptions under concurrent write load. Related:

  • Primary Key Value Not Incrementing Correctly
  • Serial numbers per group of rows for compound key
  • Auto increment table column

Or consider a UUID instead (dat type uuid) and avoid the inherent problem of duplicates with random values in a huge key space. Not at all serial, though:

  • Generating a UUID in Postgres for Insert statement?


Related Topics



Leave a reply



Submit