How to Create an Index for Elements of an Array in Postgresql

Can PostgreSQL index array columns?

Yes you can index an array, but you have to use the array operators and the GIN-index type.

Example:

    CREATE TABLE "Test"("Column1" int[]);
INSERT INTO "Test" VALUES ('{10, 15, 20}');
INSERT INTO "Test" VALUES ('{10, 20, 30}');

CREATE INDEX idx_test on "Test" USING GIN ("Column1");

-- To enforce index usage because we have only 2 records for this test...
SET enable_seqscan TO off;

EXPLAIN ANALYZE
SELECT * FROM "Test" WHERE "Column1" @> ARRAY[20];

Result:

Bitmap Heap Scan on "Test"  (cost=4.26..8.27 rows=1 width=32) (actual time=0.014..0.015 rows=2 loops=1)
Recheck Cond: ("Column1" @> '{20}'::integer[])
-> Bitmap Index Scan on idx_test (cost=0.00..4.26 rows=1 width=0) (actual time=0.009..0.009 rows=2 loops=1)
Index Cond: ("Column1" @> '{20}'::integer[])
Total runtime: 0.062 ms
Note

it appears that in many cases the gin__int_ops option is required

create index <index_name> on <table_name> using GIN (<column> gin__int_ops)

I have not yet seen a case where it would work with the && and @> operator without the gin__int_ops options

How to create an index for elements of an array in PostgreSQL?

You can create GIN indexes on any 1-dimensional array with standard Postgres.

Details in the manual here (last chapter).

While operating with integer arrays (plain int4, not int2 or int8 and no NULL values) the additional supplied module intarray provides a lot more operators and typically superior performance. Install it (once per database) with:

CREATE EXTENSION intarray;

You can create GIN or GIST indexes on integer arrays. There are examples in the manual.

CREATE EXTENSION requires PostgreSQL 9.1 or later. For older versions you need to run the supplied script.

How to use Postgresql GIN index with ARRAY keyword

You forgot to add an extra pair of parentheses that is necessary for syntactical reasons:

CREATE INDEX idx_gin ON mytab USING gin ((ARRAY[scalar_column]));

The index does not make a lot of sense. If you need to search for membership in a given array, use a regular B-tree index with = ANY.

Indexing array of strings column type in PostgreSQL

A gin index can be used:

CREATE TABLE users (
name VARCHAR(100),
groups text[]
);

CREATE INDEX idx_users ON users USING GIN(groups);

-- disable sequential scan in this test:
SET enable_seqscan TO off;

EXPLAIN ANALYZE
SELECT name FROM users WHERE groups @> (ARRAY['Engineering']);

Result:

"Bitmap Heap Scan on users  (cost=4.26..8.27 rows=1 width=218) (actual time=0.021..0.021 rows=0 loops=1)"
" Recheck Cond: (groups @> '{Engineering}'::text[])"
" -> Bitmap Index Scan on idx_users (cost=0.00..4.26 rows=1 width=0) (actual time=0.016..0.016 rows=0 loops=1)"
" Index Cond: (groups @> '{Engineering}'::text[])"
"Total runtime: 0.074 ms"

Using aggregate functions on an array, that will be another problem. The function unnest() might help.

Why don't you normalize your data? That will fix all problems, including many problems you didn't encouter yet.

Postgres index type for array column for ANY queries

Here is a detailed explanation why the ANY construct with the indexed column to the right cannot tap into a GIN index (or any index, for that matter):

  • Can PostgreSQL index array columns?

But array operators can. See:

  • Check if value exists in Postgres array

To force a test with a small table, you can disable (massively discourage, really) sequential scans in your current session with:

SET enable_seqscan = OFF;

See:

  • Postgres query optimization (forcing an index scan)

Adding element with index to an array in PostgreSQL

PostgreSQL does not have an EXTEND method like Oracle does. PostgreSQL, however, can extend 1-dimensional arrays automatically by assigning array elements beyond the end of the current array length.

In your example, this becomes very simply:

CREATE FUNCTION some_function () RETURNS something AS $$
DECLARE
invoice_list text[];
amount_list float8[];
BEGIN
-- Do something
...

FOR counter IN 1 ... 10 LOOP
-- get current values for cfp_cur.invoice_no and inv_amount
invoice_list[counter] := cfp_cur.invoice_no;
amount_list[counter] := inv_amount;
END LOOP;

-- Do something with the arrays
...
RETURN;
END;

Django, Create GIN index for child element in JSON Array field

Please don't use a JSONField [Django-doc] for well-structured data: if the structure is clear, like here where we have a list of objects where each object has a name and a product, it makes more sense to work with extra models, like:

class MyModel(models.Model):
# …
pass

class Product(models.Model):
# …
pass

class Entry(models.Model):
my_model = models.ForeignKey(MyModel, on_delete=models.CASCADE)
name = models.CharField(max_length=255)
product = models.ForeignKey(Product, on_delete=models.CASCADE)

This will automatically add indexes on the ForeignKeys, but will also make querying simpeler and usually more efficient.

While databases like PostgreSQL indeed have put effort into making JSON columns easier to query, aggregate, etc. usually it is still beter to perform database normalization [wiki], especially since it has more means for referential integrity, and a lot of aggregates are simpeler on linear data.

If for example later a product is removed, it will require a lot of work to inspect the JSON blobs to remove that product. This is however a scenario that both Django and PostgreSQL databases cover with ON DELETE triggers and which will likely be more effective and safe when using the Django toolchain for this.



Related Topics



Leave a reply



Submit