Serial Numbers Per Group of Rows For Compound Key

Serial numbers per group of rows for compound key

Don't. It has been tried many times and it's a pain.

Use a plain serial or IDENTITY column:

  • Auto increment table column
CREATE TABLE address_history (
address_history_id serial PRIMARY KEY
, person_id int NOT NULL REFERENCES people(id)
, created_at timestamp NOT NULL DEFAULT current_timestamp
, previous_address text
);

Use the window function row_number() to get serial numbers without gaps per person_id. You could persist a VIEW that you can use as drop-in replacement for your table in queries to have those numbers ready:

CREATE VIEW address_history_nr AS
SELECT *, row_number() OVER (PARTITION BY person_id
ORDER BY address_history_id) AS adr_nr
FROM address_history;

See:

  • Gap-less sequence where multiple transactions with multiple tables are involved

Or you might want to ORDER BY something else. Maybe created_at? Better created_at, address_history_id to break possible ties. Related answer:

  • Column with alternate serials

Also, the data type you are looking for is timestamp or timestamptz, not datetime in Postgres:

  • Ignoring time zones altogether in Rails and PostgreSQL

And you only need to store previous_address (or more details), not address, nor original_address. Both would be redundant in a sane data model.

UPDATE to assign serial numbers per group

UPDATE users u
SET columntoupdate = g.increment
FROM (
SELECT u2.id
, row_number() OVER (PARTITION BY u2.group_id ORDER BY u2.name) AS increment
FROM users u2
) g
WHERE u.id = g.id
-- AND u.columntoupdate IS DISTINCT FROM g.increment -- ①
;

db<>fiddle here

No need to involve the table group at all.

You need to PARTITION BY group_id for serial number per group.

And join on the PK column.

① Add this WHERE clause to suppress empty updates (for repeated use). See:

  • How do I (or can I) SELECT DISTINCT on multiple columns?

Aside:

You are aware that this data structure is not easily sustainable? Names change, users are added and deleted, gap-less numbers per group are expensive to maintain - and typically unnecessary. See:

  • Serial numbers per group of rows for compound key

Custom SERIAL / autoincrement per group of values

Concept

There are at least several ways to approach this. First one that comes to my mind:

Assign a value for category_id column inside a trigger executed for each row, by overwriting the input value from INSERT statement.

Action

Here's the SQL Fiddle to see the code in action


For a simple test, I'm creating article table holding categories and their id's that should be unique for each category. I have omitted constraint creation - that's not relevant to present the point.

create table article ( id serial, category varchar, category_id int )

Inserting some values for two distinct categories using generate_series() function to have an auto-increment already in place.

insert into article(category, category_id)
select 'stackoverflow', i from generate_series(1,1) i
union all
select 'stackexchange', i from generate_series(1,3) i

Creating a trigger function, that would select MAX(category_id) and increment its value by 1 for a category we're inserting a row with and then overwrite the value right before moving on with the actual INSERT to table (BEFORE INSERT trigger takes care of that).

CREATE OR REPLACE FUNCTION category_increment()
RETURNS trigger
LANGUAGE plpgsql
AS
$$
DECLARE
v_category_inc int := 0;
BEGIN
SELECT MAX(category_id) + 1 INTO v_category_inc FROM article WHERE category = NEW.category;
IF v_category_inc is null THEN
NEW.category_id := 1;
ELSE
NEW.category_id := v_category_inc;
END IF;
RETURN NEW;
END;
$$

Using the function as a trigger.

CREATE TRIGGER trg_category_increment 
BEFORE INSERT ON article
FOR EACH ROW EXECUTE PROCEDURE category_increment()

Inserting some more values (post trigger appliance) for already existing categories and non-existing ones.

INSERT INTO article(category) VALUES 
('stackoverflow'),
('stackexchange'),
('nonexisting');

Query used to select data:

select category, category_id From article order by 1,2

Result for initial inserts:

category    category_id
stackexchange 1
stackexchange 2
stackexchange 3
stackoverflow 1

Result after final inserts:

category    category_id
nonexisting 1
stackexchange 1
stackexchange 2
stackexchange 3
stackexchange 4
stackoverflow 1
stackoverflow 2

Sequential increment skipping numbers

serial columns, or IDENTITY in Postgres 10 or later, draw numbers from a SEQUENCE and gaps are to be expected. Their job is to make concurrent write access possible with unique numbers - not necessarily gap-less numbers.

If you don't actually have concurrent write access, there are simple ways to achieve (mostly) gap-less numbers. Like:

INSERT INTO tbl (info) 
SELECT 'xxx'
WHERE NOT EXISTS (SELECT FROM tbl WHERE info = 'xxx');

That doesn't burn a serial ID from the SEQUENCE because a duplicate insert is skipped. (The INSERT might still fail for any other reason - and burn a serial number. You could reset the SEQUENCE in such a case:

  • How to reset postgres' primary key sequence when it falls out of sync?

While inserting multiple rows in a single statement, you also have to rule out duplicates within the inserted set. Example code:

  • Return data from subselect used in INSERT in a Common Table Expression

But if you do have concurrent writes, none of the above works reliably, on principle. You better learn to accept gaps in the IDs. You can always have a query with row_number() OVER (ORDER BY id) to generate gap-less numbers after the fact. However, the numbers are still arbitrary to a degree. Smaller numbers were not necessarily committed earlier. There are exceptions under concurrent write load. Related:

  • Primary Key Value Not Incrementing Correctly
  • Serial numbers per group of rows for compound key
  • Auto increment table column

Or consider a UUID instead (dat type uuid) and avoid the inherent problem of duplicates with random values in a huge key space. Not at all serial, though:

  • Generating a UUID in Postgres for Insert statement?

How to use a temp sequence within a Postgresql function

Answer to question

The reason is that SQL functions (LANGUAGE sql) are parsed and planned as one. All objects used must exist before the function runs.

You can switch to PL/pgSQL, (LANGUAGE plpgsql) which plans each statement on demand. There you can create objects and use them in the next command.

See:

  • Why can PL/pgSQL functions have side effect, while SQL functions can't?

Since you are not returning anything, consider a PROCEDURE. (FUNCTION works, too.)

CREATE OR REPLACE PROCEDURE reindex_ids(IN bigint)
LANGUAGE plpgsql AS
$proc$
BEGIN
IF EXISTS ( SELECT FROM pg_catalog.pg_class
WHERE relname = 'id_seq_temp'
AND relnamespace = pg_my_temp_schema()
AND relkind = 'S') THEN
ALTER SEQUENCE id_seq_temp RESTART;
ELSE
CREATE TEMP SEQUENCE id_seq_temp;
END IF;

UPDATE things SET id = id + 2000 WHERE group_id = $1;
UPDATE things SET id = nextval('id_seq_temp') WHERE group_id = $1;
END
$proc$;

Call:

CALL reindex_ids(123);

This creates your temp sequence if it does not exist already.

If the sequence exists, it is reset. (Remember that temporary objects live for the duration of a session.)

In the unlikely event that some other object occupies the name, an exception is raised.

Alternative solutions

Solution 1

This usually works:

UPDATE things t
SET id = t1.new_id
FROM (
SELECT pk_id, row_number() OVER (ORDER BY id) AS new_id
FROM things
WHERE group_id = $1 -- your input here
) t1
WHERE t.pk_id = t1.pk_id;

And only updates each row once, so half the cost.

Replace pk_id with your PRIMARY KEY column, or any UNIQUE NOT NULL (combination of) column(s).

The trick is that the UPDATE typically processes rows according to the sort order of the subquery in the FROM clause. Updating in ascending order should never hit a duplicate key violation.

And the ORDER BY clause of the window function row_number() imposes that sort order on the resulting set. That's an undocumented implementation detail, so you might want to add an explicit ORDER BY to the subquery. But since the behavior of UPDATE is undocumented anyway, it still depends on an implementation detail.

You can wrap that into a plain SQL function.

Solution 2

Consider not doing what you are doing at all. Gaps in sequential numbers are typically expected and not a problem. Just live with it. See:

  • Serial numbers per group of rows for compound key


Related Topics



Leave a reply



Submit