Postgres Unique Multi-Column Index for Join Table

Postgres unique multi-column index for join table

As Primary Key

Do this if that unique is primary key:

create table tbl(
a_id int not null,
b_id int not null,
constraint tbl_pkey primary key(a_id,b_id)
);

Not Primary Key

Do this if that unique is non-primary key:

create table tbl(

-- other primary key here, e.g.:
-- id serial primary key,

a_id int not null,
b_id int not null,
constraint tbl_unique unique(a_id,b_id)
);

Existing Table

If you have existing table, do this instead:

alter table tbl
add constraint tbl_unique unique(a_id, b_id)

That alter table display this message:

NOTICE:  ALTER TABLE / ADD UNIQUE will create implicit index "tbl_unique" for table "tbl"

Query returned successfully with no result in 22 ms.

Drop

If you wanted to drop that constraint(you might want to make unique a combination of 3 fields):

ALTER TABLE tbl DROP CONSTRAINT tbl_unique;

Index & Constraint & Nulls

Regarding index, from Postgres doc:

PostgreSQL automatically creates a unique index when a unique
constraint or primary key is defined for a table

Source: http://www.postgresql.org/docs/9.1/static/indexes-unique.html


If uniqueness depends on some rules, you shall use CREATE UNIQUE INDEX, for example:

Given this:

CREATE TABLE tbl
(
a_id integer NOT NULL,
b_id integer NULL
);

alter table tbl
add constraint tbl_unique unique(a_id, b_id);

That unique can catch these duplicates, this will be rejected by database:

insert into tbl values
(1,1),
(1,1);

Yet that UNIQUE CONSTRAINT cannot catch duplicate nulls. Nulls serves as unknown, they serves as wildcard, that's why it's allowed to have multiple nulls in unique constraint. This will be accepted by database:

insert into tbl values
(1,1),
(1,null), -- think of this null as wildcard, some real value can be assigned later.
(1,null); -- and so is this. that's why both of these nulls are allowed

Think of UNIQUE CONSTRAINT that it allows deferred uniqueness, hence the acceptance of null values above.

If you want only one wildcard(null b_id) per a_id, aside from the unique constraint, you need to add a UNIQUE INDEX. UNIQUE CONSTRAINT can't have an expression on them. INDEX and UNIQUE INDEX can. This will be your complete DDL for rejecting multiple null;

This will be your complete DDL:

CREATE TABLE tbl
(
a_id integer NOT NULL,
b_id integer NULL
);
alter table tbl
add constraint tbl_unique unique(a_id, b_id);

create unique index tbl_unique_a_id on tbl(a_id) where b_id is null;

This will be rejected by your database now:

insert into tbl values
(1,1),
(1,null),
(1,null);

This will be allowed:

insert into tbl values
(1,1),
(1,null);

Related to http://www.ienablemuch.com/2010/12/postgresql-said-sql-server2008-said-non.html

Unique index with a lots of columns

Postgres 14

... just came out with a built-in hash function for records, which is substantially cheaper than my custom function. Especially for many columns! See:

  • Generate hash id for records in DB

That makes the expression index much more attractive than a generated column plus index. So just:

CREATE UNIQUE INDEX tbl_row_uni ON tbl (hash_record_extended(tbl.*,0));

This normally works, too:

CREATE UNIQUE INDEX tbl_row_uni ON tbl (hash_record_extended(tbl,0));

But the first variant is safer. In the second variant tbl would resolve to the column if a column of the same name should exist.

Postgres 13 (original answer)

I provided a solution for that problem exactly on dba.SE recently:

  • Why doesn't my UNIQUE constraint trigger?

It's pretty close to your third idea:

Basically, a very efficient server-side generated hash placed as 31th column with UNIQUE constraint.

CREATE OR REPLACE FUNCTION public.f_tbl_bighash(col1 text, col2 text, ... , col30 text)
RETURNS bigint
LANGUAGE sql IMMUTABLE PARALLEL SAFE AS
'SELECT hashtextextended(textin(record_out(($1,$2, ... ,$30))), 0)';

ALTER TABLE tbl
ADD COLUMN tbl_bighash bigint NOT NULL GENERATED ALWAYS AS (public.f_tbl_bighash(col1, col2, ... , col30)) STORED -- append column in last position
, ADD CONSTRAINT tbl_bighash_uni UNIQUE (tbl_bighash);

The beauty of it: it works efficiently without changing anything else. (Except, possibly, where you use SELECT * or INSERT INTO without target list or similar.)

And it works for NULL values, too (treating them as equal).

Careful if any column types have non-immutable text representation. (Like timestamptz.) The solution is tested with all text columns.

If the table schema changes, drop the UNIQUE constraint first, recreate the function and recreate the generated column - ideally with a single ALTER TABLE statement, so you don't rewrite the table twice.

Alternatively, use a UNIQUE expression index based on public.f_tbl_bighash(). Same effect. Upside: no additional table column. Downside: a bit more expensive, computationally.

Optimizing indexes for query on large table with multiple joins

As for the query itself, the only thing you can do is skipping on users table. From EXPLAIN you can see that it only does an Index Only Scan without actually touching the table. So, technically your query could look like this:

SELECT images.* FROM images
INNER JOIN locations ON locations.id = images.location_id
INNER JOIN user_groups ON images.creator_id = user_groups.user_id
WHERE images.deleted_at IS NULL
AND user_groups.group_id = 7
AND images.creator_type = 'User'
AND images.status = 2
AND locations.active = TRUE
ORDER BY date_uploaded DESC
OFFSET 0 LIMIT 50

The rest is about indexes. locations seems to have very little data, so optimization here will gain you nothing. user_groups on the other hand could benefit from an index ON (user_id) WHERE group_id = 7 or ON (group_id, user_id). This should remove some extra filtering on table content.

-- Option 1
CREATE INDEX ix_usergroups_userid_groupid7
ON user_groups (user_id)
WHERE group_id = 7;

-- Option 2
CREATE INDEX ix_usergroups_groupid_userid
ON user_groups (group_id, user_id);

Of course, the biggest thing here is images. Currently, the planer would do an index scan on creator_date_uploaded_Where_pub_not_del which I suspect does not fully match the requirements. Here, multiple options come to mind depending on your usage pattern - from one where the search parameters are rather common:

-- Option 1
CREATE INDEX ix_images_creatorid_typeuser_status2_notdel
ON images (creator_id)
WHERE creator_type = 'User' AND status = 2 AND deleted_at IS NULL;

to one with completely dynamic parameters:

-- Option 2
CREATE INDEX ix_images_status_creatortype_creatorid_notdel
ON images (status, creator_type, creator_id)
WHERE deleted_at IS NULL;

The first index is preferable as it is smaller (values are filtered-out rather than indexed).

To summarize, unless you are limited by memory (or other factors), I would add indexes on user_groups and images. Correct choice of indexes must be confirmed empirically, as multiple options are usually available and the situation depends on statistical distribution of data.

Unique constraint for 2 columns that works both ways

You can create a unique index that always indexes the same order of values:

create unique index 
on friends (least(requestor, requestee), greatest(requestor, requestee));

Can a Unique constraint on multiple Columns add indexes separately on those columns

If you have to enforce the unique combination of both columns, you have to create the unique index on both of them.

Postgres will use that index as well if your where clause only has a condition on the first column of the index (the usual "it depends" on index usage still applies here).

Postgres is able to use a column that is not the leading column of an index for a where condition - however that is less efficient then using a leading column.

I would put that column first that is used more often as single where condition. The order of the columns does not matter for the uniqueness.

If the usage of (only) the second column is as frequent as using the (only) first column, then adding an additional index with only the second column could make sense, e.g.:

CREATE TABLE IF NOT EXISTS videolikes (
itemid SERIAL PRIMARY KEY,
videoid integer NOT NULL,
userid integer NOT NULL,
CONSTRAINT liked_video_user UNIQUE(videoid,userid)
);

create index on videolikes (userid);

The unique index would then be used for conditions on only videoid and (equality) conditions using both columns. The second index would be used for conditions on only the userid


Unrelated, but:

The itemid primary key is pretty much useless with the above setup. You needlessly increase the size of the table and add another index that needs to be maintained. You can simply leave it out and declare videoid, userid as the primary key:

CREATE TABLE IF NOT EXISTS videolikes (
videoid integer NOT NULL,
userid integer NOT NULL,
CONSTRAINT pk_videolikes primary key (videoid,userid)
);

create index on videolikes (userid);

How do I specify unique constraint for multiple columns in MySQL?

To add a unique constraint, you need to use two components:

ALTER TABLE - to change the table schema and,

ADD UNIQUE - to add the unique constraint.

You then can define your new unique key with the format 'name'('column1', 'column2'...)

So for your particular issue, you could use this command:

ALTER TABLE `votes` ADD UNIQUE `unique_index`(`user`, `email`, `address`);

How to create a unique index containing multiple fields where one is a foreign key

This is completely undocumented so it took some playing around until I stumbled upon the correct syntax. You can actually use sub-properties in index definitions:

@Index("player_id_UNIQUE", ["player.id", "period", "year"], { unique: true })

That way, player.id is automatically mapped to player_id in the resulting SQL:

CREATE UNIQUE INDEX "player_id_UNIQUE" ON "user_earning" ("player_id", "period", "year")

Does PostgreSQL implement multi-table indexes?

As of the current version of the PostgreSQL (v 12), an index can be based on a table or materialized view only.

https://www.postgresql.org/docs/current/sql-createindex.html

CREATE INDEX constructs an index on the specified column(s) of the
specified relation, which can be a table or a materialized view.

The CREATE INDEX syntax requires a table and there can only 1 table specificed

CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ [ IF NOT EXISTS ] name ] ON
[ ONLY ] table_name [ USING method ]

table_name:

The name (possibly schema-qualified) of the table to be indexed.

The materialized view is an option but, the data in materialized view is stale until you refresh the data.

https://www.postgresql.org/docs/12/sql-creatematerializedview.html

CREATE MATERIALIZED VIEW defines a materialized view of a query. The
query is executed and used to populate the view at the time the
command is issued (unless WITH NO DATA is used) and may be refreshed
later using REFRESH MATERIALIZED VIEW.

You maybe able to balance it out by automating a process to run REFRESH MATERIALIZED VIEW command in a way to reduce the likelihood of stale data. For example, after large data imports and at regular intervals at other times. But, if your data is large enough to require indexing, the refresh & re-indexing process will not be fast enough and thus you won't be able to execute it after every CRUD statement in an OLTP scenario.

In conclusion, what you are looking for does not exist in PostgreSQL as of v 12.



Related Topics



Leave a reply



Submit