Postgres and Indexes on Foreign Keys and Primary Keys

Postgres and Indexes on Foreign Keys and Primary Keys

PostgreSQL automatically creates indexes on primary keys and unique constraints, but not on the referencing side of foreign key relationships.

When Pg creates an implicit index it will emit a NOTICE-level message that you can see in psql and/or the system logs, so you can see when it happens. Automatically created indexes are visible in \d output for a table, too.

The documentation on unique indexes says:

PostgreSQL automatically creates an index for each unique constraint and primary key constraint to enforce uniqueness. Thus, it is not necessary to create an index explicitly for primary key columns.

and the documentation on constraints says:

Since a DELETE of a row from the referenced table or an UPDATE of a
referenced column will require a scan of the referencing table for
rows matching the old value, it is often a good idea to index the
referencing columns. Because this is not always needed, and there are
many choices available on how to index, declaration of a foreign key
constraint does not automatically create an index on the referencing
columns.

Therefore you have to create indexes on foreign-keys yourself if you want them.

Note that if you use primary-foreign-keys, like 2 FK's as a PK in a M-to-N table, you will have an index on the PK and probably don't need to create any extra indexes.

While it's usually a good idea to create an index on (or including) your referencing-side foreign key columns, it isn't required. Each index you add slows DML operations down slightly, so you pay a performance cost on every INSERT, UPDATE or DELETE. If the index is rarely used it may not be worth having.

Does a postgres foreign key imply an index?

PostgreSQL does not automatically create an index on the columns on which a foreign key is defined. If you need such an index, you will have to create it yourself.

It is usually a good idea to have such an index, so that modifications on the parent table that affect the referenced columns are efficient.

Indexing Foreign Keys in Postgresql

tl;dr You need to add an index on item_id. The "black magic" of Postgres indexing is covered in 11. Indexes.

You have a composite index on (topic_id, item_id) and column order is important. Postgres can use this to index queries on topic_id, queries on both topic_id and item_id, but not (or less efficiently) item_id alone.

From 11.3. Multicolumn Indexes...

A multicolumn B-tree index can be used with query conditions that involve any subset of the index's columns, but the index is most efficient when there are constraints on the leading (leftmost) columns.

-- indexed
select *
from topics_items
where topic_id = ?

-- also indexed
select *
from topics_items
where topic_id = ?
and item_id = ?

-- probably not indexed
select *
from topics_items
where item_id = ?

This is because a composite index like (topic_id, item_id) stores the topic ID first, then an item IDs which also have that topic ID. In order to look up an item ID efficiently in this index, Postgres must first narrow the search with a topic ID.


Postgres can reverse an index if it thinks it's worth the effort. If there's a small number of possible topic IDs, and a large number of possible index IDs, it will search for the index ID in each topic ID.

For example, let's say you have 10 possible topic IDs and 1000 possible item IDs and your index (topic_id, index_id). This is like having 10 clearly labelled topic ID buckets each with 1000 clearly labelled item ID buckets inside. To get to the item ID buckets, it must look inside each topic ID bucket. To use this index on where item_id = 23 Postgres must search each of the 10 topic ID buckets for all the buckets with item ID 23.

But if you have 1000 possible topic IDs and 10 possible item IDs, Postgres would have to search 1000 topic IDs buckets. Most likely it will do a full table scan instead. In this case you'd want to reverse your index and make it (item_id, topic_id).

This depends heavily on having good table statistics, which means making sure autovacuum is working properly.

So you can get away with a single index for two columns, if one column has far less variability than another.


Postgres can also use mulitple indexes if it thinks it will make the query run faster. For example, if you had an index on topic_id and an index on item_id, it can use both indexes and combine the results. For example where topic_id = 23 or item_id = 42 could use the topic_id index to search for topic ID 23, and the item_id index to search for item ID 42, then combine the results.

This is generally slower than having a composite (topic_id, item_id) index. It can also be slower than using a single index, so don't be surprised if Postgres decides not to use multiple indexes.


In general, for b-tree indexes, when you have two columns you have three possible combinations.

  • a + b
  • a
  • b

And you need two indexes.

  • (a, b) -- a and a + b
  • (b) -- b

(a, b) covers both searches for a and a + b. (b) covers searching for b.

When you have three columns, you have seven possible combinations.

  • a + b + c
  • a + b
  • a + c
  • a
  • b + c
  • b
  • c

But you only need three indexes.

  • (a, b, c) -- a, a + b, a + b + c
  • (b, c) -- b, b + c
  • (c, a) -- c, c + a

However, you probably actually want to avoid having an index on three columns. It's often slower. What you actually want is this.

  • (a, b)
  • (b, c)
  • (c, a)

Multicolumn indexes should be used sparingly. In most situations, an index on a single column is sufficient and saves space and time. Indexes with more than three columns are unlikely to be helpful unless the usage of the table is extremely stylized.

Reading from an index is slower than reading from the table. You want your indexes to reduce the number of rows which must be read, but you don't want Postgres to have to do any more index scanning than necessary.

Constraints on columns to the right... are checked in the index, so they save visits to the table proper, but they do not reduce the portion of the index that has to be scanned. For example, given an index on (a, b, c) and a query condition WHERE a = 5 AND b >= 42 AND c < 77, the index would have to be scanned from the first entry with a = 5 and b = 42 up through the last entry with a = 5. Index entries with c >= 77 would be skipped, but they'd still have to be scanned through.

Postgresql and primary key, foreign key indexing

A primary key is behind the scenes a special kind of a unique index. The quote referencing, that it might be a good idea to create an index also on columns, where the primary key is used as an foreign key.

Do i need index for composite primary key where each column is foreign key


As i understand composite pk creates unique index user_id+friend_id intenally

Yes.

and this index is used for user_id foreign key.

Not exactly, no.

There are two cases here:

  • an INSERT or UPDATE on the user_friend table. It will have to check the table's primary key of course (and will use the user_friend_pkey index for that). For the foreign key constraints, this will only have to check whether the user_id and friend_id values exist in the user table, and will use the index created by user's primary key for those.
  • an UPDATE or DELETE on the user table. It will have to check whether the old value was referenced in the user_friend table, and act according to the CASCADE or RESTRICT strategy. To find this row in user_friend, it will use the indices on user_id and on friend_id if they exist.

But do i need to create index for friend_id because index on composite pk friend table will not work for friend_id

Yes, looking up by friend_id is slow without the extra index. You only need this if you want to find users that have x as their friend, or when updates/deletes on user must not be slow.

Postgres: complex foreign key constraints

I figured out having a function to execute on constraint will solve this issue.
The function is_storeitem does the validation. I believe this feature can be used for even complex validations

create or replace function is_storeitem(pItemcode nchar(40), pStoreId nchar(20)) returns boolean as $$
select exists (
select 1
from public.storeitem si, public.item i, public.store s
where si.itemid = i.itemid and i.itemcode = pItemcode and s.Storeid = pStoreId and s.storeid = si.storeid
);
$$ language sql;


create table SZData
(
StoreID NCHAR(20) not null,
ItemCode NCHAR(100) not null,
TextData NCHAR(20) not null,
constraint PK_SIDATA primary key (ItemCode, StoreID),
foreign key (StoreID) references Store(StoreID),
foreign key (ItemCode) references Item(ItemCode),
CONSTRAINT ck_szdata_itemcode CHECK (is_storeitem(Itemcode,StoreID))
);

This perfectly works with postgres 9.6 or greater.

PostgreSQL: How to index all foreign keys?

EDIT: so, I wrote the query below and then thought... "hang on, Postgresql requires that foreign key targets have to have unique indices." So I guess I misunderstood what you meant? You can use the below query to check that the source of your foreign keys have indices by substituing "conrelid" for "confrelid" and "conkey" for "confkey" (yeah, yeah, no aliases in the query...)

Well, I guess it should be possible to go through the system catalogues... As usual, the best guide to the system catalogues is to use psql and do "\set ECHO_HIDDEN 1" and then see what SQL it generates for interesting "\d" commands. Here's the SQL used to find the foreign keys for a table ("\d tablename") :

-- $1 is the table OID, e.g. 'tablename'::regclass
SELECT conname, conrelid::pg_catalog.regclass,
pg_catalog.pg_get_constraintdef(c.oid, true) as condef
FROM pg_catalog.pg_constraint c
WHERE c.confrelid = $1 AND c.contype = 'f' ORDER BY 1;

Seems that pg_constraint has columns conkey and confkey that look like they could be the column numbers that the key is defined across. Probably confkey is the column numbers in the foreign table since it's only non-null for foreign keys. Also, took me a while to realise this is the SQL to show foreign keys referencing the given table. Which is what we want anyway.

So something this query shows the data beginning to take shape:

select confrelid, conname, column_index, attname
from pg_attribute
join (select confrelid::regclass, conname, unnest(confkey) as column_index
from pg_constraint
where confrelid = 'ticket_status'::regclass) fkey
on fkey.confrelid = pg_attribute.attrelid
and fkey.column_index = pg_attribute.attnum

I'm going to be using 8.4 features like unnest... you might be able to get along without.

I ended up with:

select pg_index.indexrelid::regclass, 'create index ' || relname || '_' ||
array_to_string(column_name_list, '_') || '_idx on ' || confrelid ||
' (' || array_to_string(column_name_list, ',') || ')'
from (select distinct
confrelid,
array_agg(attname) column_name_list,
array_agg(attnum) as column_list
from pg_attribute
join (select confrelid::regclass,
conname,
unnest(confkey) as column_index
from (select distinct
confrelid, conname, confkey
from pg_constraint
join pg_class on pg_class.oid = pg_constraint.confrelid
join pg_namespace on pg_namespace.oid = pg_class.relnamespace
where nspname !~ '^pg_' and nspname <> 'information_schema'
) fkey
) fkey
on fkey.confrelid = pg_attribute.attrelid
and fkey.column_index = pg_attribute.attnum
group by confrelid, conname
) candidate_index
join pg_class on pg_class.oid = candidate_index.confrelid
left join pg_index on pg_index.indrelid = confrelid
and indkey::text = array_to_string(column_list, ' ')

OK, this monstrosity prints out the candidate index commands and tries to match them up with existing indices. So you can simply add "where indexrelid is null" on the end to get the commands to create indices that don't seem to exist.

This query doesn't deal with multi-column foreign keys very well; but imho if you're using those, you deserve trouble.

LATER EDIT: here's the query with the proposed edits up at the top put in. So this shows the commands to create indices that don't exist, on columns that are the source of a foreign key (not its target).

select pg_index.indexrelid::regclass, 'create index ' || relname || '_' ||
array_to_string(column_name_list, '_') || '_idx on ' || conrelid ||
' (' || array_to_string(column_name_list, ',') || ')'
from (select distinct
conrelid,
array_agg(attname) column_name_list,
array_agg(attnum) as column_list
from pg_attribute
join (select conrelid::regclass,
conname,
unnest(conkey) as column_index
from (select distinct
conrelid, conname, conkey
from pg_constraint
join pg_class on pg_class.oid = pg_constraint.conrelid
join pg_namespace on pg_namespace.oid = pg_class.relnamespace
where nspname !~ '^pg_' and nspname <> 'information_schema'
) fkey
) fkey
on fkey.conrelid = pg_attribute.attrelid
and fkey.column_index = pg_attribute.attnum
group by conrelid, conname
) candidate_index
join pg_class on pg_class.oid = candidate_index.conrelid
left join pg_index on pg_index.indrelid = conrelid
and indkey::text = array_to_string(column_list, ' ')
where indexrelid is null

My experience is that this isn't really all that useful. It suggests creating indices for things like reference codes that really don't need to be indexed.

multiple foreign keys as primary key postgres, should I do it?

This is a perfectly valid design choice.

You have a many-to-many relationship between books and users, which is represented by the reviews table. Having a compound primary key based on two foreign keys lets you enforce referential integrity (a given tuple may only appear once), and at the same time provide a primary key for the table.

Another option would be to have a surrogate primary key for the bridge table. This could make things easier if you need to reference the reviews from another table, but you would still need a unique constraint on both foreign key columns for integrity, so this would actually result in extra space being used.

When it comes to your code, it has a few issues:

  • the primary key keyword goes after the datatype

  • the check constraint is incorrectly formed

  • missing or additional commas here and there

Consider:

CREATE TABLE users(
user_id SERIAL PRIMARY KEY ,
user_name VARCHAR NOT NULL UNIQUE,
pass_hash VARCHAR NOT NULL,
email VARCHAR NOT NULL UNIQUE
);

CREATE TABLE books(
book_id BIGINT PRIMARY KEY,
author VARCHAR NOT NULL,
title VARCHAR NOT NULL,
year INT NOT NULL CHECK (year > 1484),
review_count INT DEFAULT 0 NOT NULL,
avrg_score FLOAT
);

CREATE TABLE reviews(
user_id INT REFERENCES users(user_id) NOT NULL,
book_id INT REFERENCES books(book_id) NOT NULL,
score INT NOT NULL CHECK (score > 0 and score < 11),
PRIMARY KEY (book_id, user_id)

);


Related Topics



Leave a reply



Submit