Serial Numbers Per Group of Rows For Compound Key

Serial numbers per group of rows for compound key

Don't. It has been tried many times and it's a pain.

Use a plain serial or IDENTITY column:

Auto increment table column

CREATE TABLE address_history (
  address_history_id serial PRIMARY KEY
, person_id int NOT NULL REFERENCES people(id)
, created_at timestamp NOT NULL DEFAULT current_timestamp
, previous_address text
);

Use the window function row_number() to get serial numbers without gaps per person_id. You could persist a VIEW that you can use as drop-in replacement for your table in queries to have those numbers ready:

CREATE VIEW address_history_nr AS
SELECT *, row_number() OVER (PARTITION BY person_id
                             ORDER BY address_history_id) AS adr_nr
FROM   address_history;

See:

Gap-less sequence where multiple transactions with multiple tables are involved

Or you might want to ORDER BY something else. Maybe created_at? Better created_at, address_history_id to break possible ties. Related answer:

Column with alternate serials

Also, the data type you are looking for is timestamp or timestamptz, not ~~datetime~~ in Postgres:

Ignoring time zones altogether in Rails and PostgreSQL

And you only need to store previous_address (or more details), not ~~address~~, nor ~~original_address~~. Both would be redundant in a sane data model.

UPDATE to assign serial numbers per group

UPDATE users u
SET    columntoupdate = g.increment
FROM  (
   SELECT u2.id
        , row_number() OVER (PARTITION BY u2.group_id ORDER BY u2.name) AS increment
   FROM   users u2
   ) g
WHERE u.id = g.id
-- AND u.columntoupdate IS DISTINCT FROM g.increment  -- ①
;

db<>fiddle here

No need to involve the table group at all.

You need to PARTITION BY group_id for serial number per group.

And join on the PK column.

① Add this WHERE clause to suppress empty updates (for repeated use). See:

How do I (or can I) SELECT DISTINCT on multiple columns?

Aside:

You are aware that this data structure is not easily sustainable? Names change, users are added and deleted, gap-less numbers per group are expensive to maintain - and typically unnecessary. See:

Serial numbers per group of rows for compound key

Custom SERIAL / autoincrement per group of values

Concept

There are at least several ways to approach this. First one that comes to my mind:

Assign a value for category_id column inside a trigger executed for each row, by overwriting the input value from INSERT statement.

Action

Here's the SQL Fiddle to see the code in action

For a simple test, I'm creating article table holding categories and their id's that should be unique for each category. I have omitted constraint creation - that's not relevant to present the point.

create table article ( id serial, category varchar, category_id int )

Inserting some values for two distinct categories using generate_series() function to have an auto-increment already in place.

insert into article(category, category_id)
  select 'stackoverflow', i from generate_series(1,1) i
  union all
  select 'stackexchange', i from generate_series(1,3) i

Creating a trigger function, that would select MAX(category_id) and increment its value by 1 for a category we're inserting a row with and then overwrite the value right before moving on with the actual INSERT to table (BEFORE INSERT trigger takes care of that).

CREATE OR REPLACE FUNCTION category_increment()
RETURNS trigger
LANGUAGE plpgsql
AS
$$
DECLARE
  v_category_inc int := 0;
BEGIN
  SELECT MAX(category_id) + 1 INTO v_category_inc FROM article WHERE category = NEW.category;
  IF v_category_inc is null THEN
    NEW.category_id := 1;
  ELSE
    NEW.category_id := v_category_inc;
  END IF;
RETURN NEW;
END;
$$

Using the function as a trigger.

CREATE TRIGGER trg_category_increment 
  BEFORE INSERT ON article 
  FOR EACH ROW EXECUTE PROCEDURE category_increment()

Inserting some more values (post trigger appliance) for already existing categories and non-existing ones.

INSERT INTO article(category) VALUES 
  ('stackoverflow'),
  ('stackexchange'),
  ('nonexisting');

Query used to select data:

select category, category_id From article order by 1,2

Result for initial inserts:

category    category_id
stackexchange   1
stackexchange   2
stackexchange   3
stackoverflow   1

Result after final inserts:

category    category_id
nonexisting     1
stackexchange   1
stackexchange   2
stackexchange   3
stackexchange   4
stackoverflow   1
stackoverflow   2

Sequential increment skipping numbers

serial columns, or IDENTITY in Postgres 10 or later, draw numbers from a SEQUENCE and gaps are to be expected. Their job is to make concurrent write access possible with unique numbers - not necessarily gap-less numbers.

If you don't actually have concurrent write access, there are simple ways to achieve (mostly) gap-less numbers. Like:

INSERT INTO tbl (info) 
SELECT 'xxx'
WHERE NOT EXISTS (SELECT FROM tbl WHERE info = 'xxx');

That doesn't burn a serial ID from the SEQUENCE because a duplicate insert is skipped. (The INSERT might still fail for any other reason - and burn a serial number. You could reset the SEQUENCE in such a case:

How to reset postgres' primary key sequence when it falls out of sync?

While inserting multiple rows in a single statement, you also have to rule out duplicates within the inserted set. Example code:

Return data from subselect used in INSERT in a Common Table Expression

But if you do have concurrent writes, none of the above works reliably, on principle. You better learn to accept gaps in the IDs. You can always have a query with row_number() OVER (ORDER BY id) to generate gap-less numbers after the fact. However, the numbers are still arbitrary to a degree. Smaller numbers were not necessarily committed earlier. There are exceptions under concurrent write load. Related:

Primary Key Value Not Incrementing Correctly
Serial numbers per group of rows for compound key
Auto increment table column

Or consider a UUID instead (dat type uuid) and avoid the inherent problem of duplicates with random values in a huge key space. Not at all serial, though:

Generating a UUID in Postgres for Insert statement?

How to use a temp sequence within a Postgresql function

Answer to question

The reason is that SQL functions (LANGUAGE sql) are parsed and planned as one. All objects used must exist before the function runs.

You can switch to PL/pgSQL, (LANGUAGE plpgsql) which plans each statement on demand. There you can create objects and use them in the next command.

See:

Why can PL/pgSQL functions have side effect, while SQL functions can't?

Since you are not returning anything, consider a PROCEDURE. (FUNCTION works, too.)

CREATE OR REPLACE PROCEDURE reindex_ids(IN bigint)
  LANGUAGE plpgsql AS
$proc$
BEGIN
   IF EXISTS ( SELECT FROM pg_catalog.pg_class
               WHERE  relname = 'id_seq_temp'
               AND    relnamespace = pg_my_temp_schema()
               AND    relkind = 'S') THEN
      ALTER SEQUENCE id_seq_temp RESTART;
   ELSE
      CREATE TEMP SEQUENCE id_seq_temp;
   END IF;

    UPDATE things SET id = id + 2000 WHERE group_id = $1;
    UPDATE things SET id = nextval('id_seq_temp') WHERE group_id = $1;
END
$proc$;

Call:

CALL reindex_ids(123);

This creates your temp sequence if it does not exist already.

If the sequence exists, it is reset. (Remember that temporary objects live for the duration of a session.)

In the unlikely event that some other object occupies the name, an exception is raised.

Alternative solutions

Solution 1

This usually works:

UPDATE things t
SET    id = t1.new_id
FROM  (
   SELECT pk_id, row_number() OVER (ORDER BY id) AS new_id
   FROM   things
   WHERE  group_id = $1     -- your input here
   ) t1
WHERE  t.pk_id = t1.pk_id;

And only updates each row once, so half the cost.

Replace pk_id with your PRIMARY KEY column, or any UNIQUE NOT NULL (combination of) column(s).

The trick is that the UPDATE typically processes rows according to the sort order of the subquery in the FROM clause. Updating in ascending order should never hit a duplicate key violation.

And the ORDER BY clause of the window function row_number() imposes that sort order on the resulting set. That's an undocumented implementation detail, so you might want to add an explicit ORDER BY to the subquery. But since the behavior of UPDATE is undocumented anyway, it still depends on an implementation detail.

You can wrap that into a plain SQL function.

Solution 2

Consider not doing what you are doing at all. Gaps in sequential numbers are typically expected and not a problem. Just live with it. See:

Serial numbers per group of rows for compound key

Serial Numbers Per Group of Rows For Compound Key