Serial numbers per group of rows for compound key
Don't. It has been tried many times and it's a pain.
Use a plain serial
or IDENTITY
column:
- Auto increment table column
CREATE TABLE address_history (
address_history_id serial PRIMARY KEY
, person_id int NOT NULL REFERENCES people(id)
, created_at timestamp NOT NULL DEFAULT current_timestamp
, previous_address text
);
Use the window function row_number()
to get serial numbers without gaps per person_id
. You could persist a VIEW
that you can use as drop-in replacement for your table in queries to have those numbers ready:
CREATE VIEW address_history_nr AS
SELECT *, row_number() OVER (PARTITION BY person_id
ORDER BY address_history_id) AS adr_nr
FROM address_history;
See:
- Gap-less sequence where multiple transactions with multiple tables are involved
Or you might want to ORDER BY
something else. Maybe created_at
? Better created_at, address_history_id
to break possible ties. Related answer:
- Column with alternate serials
Also, the data type you are looking for is timestamp
or timestamptz
, not in Postgres:datetime
- Ignoring time zones altogether in Rails and PostgreSQL
And you only need to store previous_address
(or more details), not , nor address
. Both would be redundant in a sane data model.original_address
UPDATE to assign serial numbers per group
UPDATE users u
SET columntoupdate = g.increment
FROM (
SELECT u2.id
, row_number() OVER (PARTITION BY u2.group_id ORDER BY u2.name) AS increment
FROM users u2
) g
WHERE u.id = g.id
-- AND u.columntoupdate IS DISTINCT FROM g.increment -- ①
;
db<>fiddle here
No need to involve the table group
at all.
You need to PARTITION BY group_id
for serial number per group.
And join on the PK column.
① Add this WHERE
clause to suppress empty updates (for repeated use). See:
- How do I (or can I) SELECT DISTINCT on multiple columns?
Aside:
You are aware that this data structure is not easily sustainable? Names change, users are added and deleted, gap-less numbers per group are expensive to maintain - and typically unnecessary. See:
- Serial numbers per group of rows for compound key
Custom SERIAL / autoincrement per group of values
Concept
There are at least several ways to approach this. First one that comes to my mind:
Assign a value for category_id
column inside a trigger executed for each row, by overwriting the input value from INSERT
statement.
Action
Here's the SQL Fiddle to see the code in action
For a simple test, I'm creating article
table holding categories and their id
's that should be unique for each category
. I have omitted constraint creation - that's not relevant to present the point.
create table article ( id serial, category varchar, category_id int )
Inserting some values for two distinct categories using generate_series()
function to have an auto-increment already in place.
insert into article(category, category_id)
select 'stackoverflow', i from generate_series(1,1) i
union all
select 'stackexchange', i from generate_series(1,3) i
Creating a trigger function, that would select MAX(category_id)
and increment its value by 1
for a category
we're inserting a row with and then overwrite the value right before moving on with the actual INSERT
to table (BEFORE INSERT
trigger takes care of that).
CREATE OR REPLACE FUNCTION category_increment()
RETURNS trigger
LANGUAGE plpgsql
AS
$$
DECLARE
v_category_inc int := 0;
BEGIN
SELECT MAX(category_id) + 1 INTO v_category_inc FROM article WHERE category = NEW.category;
IF v_category_inc is null THEN
NEW.category_id := 1;
ELSE
NEW.category_id := v_category_inc;
END IF;
RETURN NEW;
END;
$$
Using the function as a trigger.
CREATE TRIGGER trg_category_increment
BEFORE INSERT ON article
FOR EACH ROW EXECUTE PROCEDURE category_increment()
Inserting some more values (post trigger appliance) for already existing categories and non-existing ones.
INSERT INTO article(category) VALUES
('stackoverflow'),
('stackexchange'),
('nonexisting');
Query used to select data:
select category, category_id From article order by 1,2
Result for initial inserts:
category category_id
stackexchange 1
stackexchange 2
stackexchange 3
stackoverflow 1
Result after final inserts:
category category_id
nonexisting 1
stackexchange 1
stackexchange 2
stackexchange 3
stackexchange 4
stackoverflow 1
stackoverflow 2
Sequential increment skipping numbers
serial
columns, or IDENTITY
in Postgres 10 or later, draw numbers from a SEQUENCE
and gaps are to be expected. Their job is to make concurrent write access possible with unique numbers - not necessarily gap-less numbers.
If you don't actually have concurrent write access, there are simple ways to achieve (mostly) gap-less numbers. Like:
INSERT INTO tbl (info)
SELECT 'xxx'
WHERE NOT EXISTS (SELECT FROM tbl WHERE info = 'xxx');
That doesn't burn a serial ID from the SEQUENCE
because a duplicate insert is skipped. (The INSERT
might still fail for any other reason - and burn a serial number. You could reset the SEQUENCE
in such a case:
- How to reset postgres' primary key sequence when it falls out of sync?
While inserting multiple rows in a single statement, you also have to rule out duplicates within the inserted set. Example code:
- Return data from subselect used in INSERT in a Common Table Expression
But if you do have concurrent writes, none of the above works reliably, on principle. You better learn to accept gaps in the IDs. You can always have a query with row_number() OVER (ORDER BY id)
to generate gap-less numbers after the fact. However, the numbers are still arbitrary to a degree. Smaller numbers were not necessarily committed earlier. There are exceptions under concurrent write load. Related:
- Primary Key Value Not Incrementing Correctly
- Serial numbers per group of rows for compound key
- Auto increment table column
Or consider a UUID instead (dat type uuid
) and avoid the inherent problem of duplicates with random values in a huge key space. Not at all serial, though:
- Generating a UUID in Postgres for Insert statement?
How to use a temp sequence within a Postgresql function
Answer to question
The reason is that SQL functions (LANGUAGE sql
) are parsed and planned as one. All objects used must exist before the function runs.
You can switch to PL/pgSQL, (LANGUAGE plpgsql
) which plans each statement on demand. There you can create objects and use them in the next command.
See:
- Why can PL/pgSQL functions have side effect, while SQL functions can't?
Since you are not returning anything, consider a PROCEDURE
. (FUNCTION
works, too.)
CREATE OR REPLACE PROCEDURE reindex_ids(IN bigint)
LANGUAGE plpgsql AS
$proc$
BEGIN
IF EXISTS ( SELECT FROM pg_catalog.pg_class
WHERE relname = 'id_seq_temp'
AND relnamespace = pg_my_temp_schema()
AND relkind = 'S') THEN
ALTER SEQUENCE id_seq_temp RESTART;
ELSE
CREATE TEMP SEQUENCE id_seq_temp;
END IF;
UPDATE things SET id = id + 2000 WHERE group_id = $1;
UPDATE things SET id = nextval('id_seq_temp') WHERE group_id = $1;
END
$proc$;
Call:
CALL reindex_ids(123);
This creates your temp sequence if it does not exist already.
If the sequence exists, it is reset. (Remember that temporary objects live for the duration of a session.)
In the unlikely event that some other object occupies the name, an exception is raised.
Alternative solutions
Solution 1
This usually works:
UPDATE things t
SET id = t1.new_id
FROM (
SELECT pk_id, row_number() OVER (ORDER BY id) AS new_id
FROM things
WHERE group_id = $1 -- your input here
) t1
WHERE t.pk_id = t1.pk_id;
And only updates each row once, so half the cost.
Replace pk_id
with your PRIMARY KEY
column, or any UNIQUE NOT NULL
(combination of) column(s).
The trick is that the UPDATE
typically processes rows according to the sort order of the subquery in the FROM
clause. Updating in ascending order should never hit a duplicate key violation.
And the ORDER BY
clause of the window function row_number()
imposes that sort order on the resulting set. That's an undocumented implementation detail, so you might want to add an explicit ORDER BY
to the subquery. But since the behavior of UPDATE
is undocumented anyway, it still depends on an implementation detail.
You can wrap that into a plain SQL function.
Solution 2
Consider not doing what you are doing at all. Gaps in sequential numbers are typically expected and not a problem. Just live with it. See:
- Serial numbers per group of rows for compound key
Related Topics
Using Union and Order by Clause in MySQL
What Are Covering Indexes and Covered Queries in SQL Server
Get Day of Week in SQL Server 2005/2008
Custom Serial/Autoincrement Per Group of Values
MySQL Delete from With Subquery as Condition
How to Find Gaps in Sequential Numbering in MySQL
Creating Table Names That Are Reserved Words/Keywords in Ms SQL Server
Doing a Where .. in Subquery in Doctrine 2
How to Roll Back Create Table and Alter Table Statements in Major SQL Databases
Fastest Way to Count Exact Number of Rows in a Very Large Table
Generate a Resultset of Incrementing Dates in Tsql
Foreign Key Referring to Primary Keys Across Multiple Tables
Creating a "Numbers Table" in MySQL
How to Find the Employee with the Second Highest Salary