Using Indexes in JSON Array in Postgresql

Index for finding an element in a JSON array

jsonb in Postgres 9.4+

The binary JSON data type jsonb largely improves index options. You can now have a GIN index on a jsonb array directly:

CREATE TABLE tracks (id serial, artists jsonb);  -- !
CREATE INDEX tracks_artists_gin_idx ON tracks USING gin (artists);

No need for a function to convert the array. This would support a query:

SELECT * FROM tracks WHERE artists @> '[{"name": "The Dirty Heads"}]';

@> being the jsonb "contains" operator, which can use the GIN index. (Not for json, only jsonb!)

Or you use the more specialized, non-default GIN operator class jsonb_path_ops for the index:

CREATE INDEX tracks_artists_gin_idx ON tracks
USING gin (artists jsonb_path_ops); -- !

Same query.

Currently jsonb_path_ops only supports the @> operator. But it's typically much smaller and faster. There are more index options, details in the manual.


If the column artists only holds names as displayed in the example, it would be more efficient to store just the values as JSON text primitives and the redundant key can be the column name.

Note the difference between JSON objects and primitive types:

  • Using indexes in json array in PostgreSQL
CREATE TABLE tracks (id serial, artistnames jsonb);
INSERT INTO tracks VALUES (2, '["The Dirty Heads", "Louis Richards"]');

CREATE INDEX tracks_artistnames_gin_idx ON tracks USING gin (artistnames);

Query:

SELECT * FROM tracks WHERE artistnames ? 'The Dirty Heads';

? does not work for object values, just keys and array elements.

Or:

CREATE INDEX tracks_artistnames_gin_idx ON tracks
USING gin (artistnames jsonb_path_ops);

Query:

SELECT * FROM tracks WHERE artistnames @> '"The Dirty Heads"'::jsonb;

More efficient if names are highly duplicative.

json in Postgres 9.3+

This should work with an IMMUTABLE function:

CREATE OR REPLACE FUNCTION json2arr(_j json, _key text)
RETURNS text[] LANGUAGE sql IMMUTABLE AS
'SELECT ARRAY(SELECT elem->>_key FROM json_array_elements(_j) elem)';

Create this functional index:

CREATE INDEX tracks_artists_gin_idx ON tracks
USING gin (json2arr(artists, 'name'));

And use a query like this. The expression in the WHERE clause has to match the one in the index:

SELECT * FROM tracks
WHERE '{"The Dirty Heads"}'::text[] <@ (json2arr(artists, 'name'));

Updated with feedback in comments. We need to use array operators to support the GIN index.

The "is contained by" operator <@ in this case.

Notes on function volatility

You can declare your function IMMUTABLE even if json_array_elements() isn't wasn't.

Most JSON functions used to be only STABLE, not IMMUTABLE. There was a discussion on the hackers list to change that. Most are IMMUTABLE now. Check with:

SELECT p.proname, p.provolatile
FROM pg_proc p
JOIN pg_namespace n ON n.oid = p.pronamespace
WHERE n.nspname = 'pg_catalog'
AND p.proname ~~* '%json%';

Functional indexes only work with IMMUTABLE functions.

Add and use index for jsonb with nested arrays

You already have a very good index to support your query.

Make use of it with the jsonb "contains" operator" @>:

SELECT *
FROM my_table
WHERE marc->'dynamicFields' @> '[{"name": "200", "subfields":[{"name": "a"}]}]';

db<>fiddle here

Carefully match the structure of the JSON object in the table. Then rows are selected cheaply using the index.

You can then extract whatever parts you need from qualifying rows.

Detailed instructions:

  • Index for finding an element in a JSON array

If one of the filters is very selective on its own, it might be faster to split the two conditions like in your original. Either way, both variants should be fast:

SELECT *
FROM my_table
WHERE marc->'dynamicFields' @> '[{"name": "200"}]'
AND marc->'dynamicFields' @> '[{"subfields":[{"name": "a"}]}]';

Using indexes in json array in PostgreSQL

This specific jsonb example from the original answer was missing the array layer [] around the non-primitive object for the containment query. It has since been fixed.

The behavior documented for PostgreSQL 9.4.x jsonb Containment and Existence states:

The general principle is that the contained object must match the containing object as to structure and data contents

...

As a special exception to the general principle that the structures must match, an array may contain a primitive value

The special exception allows us to do the following:

CREATE TABLE tracks (id serial, artistnames jsonb);
CREATE INDEX tracks_artistnames_gin_idx ON tracks USING gin (artistnames);
INSERT INTO tracks (id, artists) VALUES (1, '["blink-182"]');
INSERT INTO tracks (id, artists) VALUES (2, '["The Dirty Heads", "Louis Richards"]');

We can query for containment using the general principle:

SELECT * FROM tracks WHERE artistnames @> '["The Dirty Heads"]';
id | artistnames
----+---------------------------------------
2 | ["The Dirty Heads", "Louis Richards"]
(1 row)

We can also query for containment using the special exception since the array contains primitive types:

SELECT * FROM tracks WHERE artistnames @> '"The Dirty Heads"';
id | artistnames
----+---------------------------------------
2 | ["The Dirty Heads", "Louis Richards"]
(1 row)

There are 4 primitive types that allow containment and existence queries on arrays to work:

  1. string
  2. number
  3. boolean
  4. null

Since the example you mentioned in your question is dealing with objects nested inside an array, we don't qualify for the special exception mentioned above:

CREATE TABLE tracks (id serial, artists jsonb);
CREATE INDEX tracks_artists_gin_idx ON tracks USING gin (artists);
INSERT INTO tracks (id, artists) VALUES (1, '[{"name": "blink-182"}]');
INSERT INTO tracks (id, artists) VALUES (2, '[{"name": "The Dirty Heads"}, {"name": "Louis Richards"}]');

We can query for containment using the general principle:

SELECT * FROM tracks WHERE artists @> '[{"name": "The Dirty Heads"}]';
id | artists
----+-----------------------------------------------------------
2 | [{"name": "The Dirty Heads"}, {"name": "Louis Richards"}]
(1 row)

Objects are not considered a primitive type, so the following query for containment does not qualify for the special exception and therefore does not work:

SELECT * FROM tracks WHERE artists @> '{"name": "The Dirty Heads"}';
id | artists
----+---------
(0 rows)

Indexing a jsonb array in Postgres

There is a general rule that you have to use exactly the same expression both in an index and a query to use the index. With this index:

CREATE INDEX idx_aip_tags 
ON t_aip
USING gin ((json_aip#>'{properties,pdi,contextInformation,tags}'));

the query will use the index

EXPLAIN ANALYZE 
SELECT count(*)
FROM t_aip
WHERE json_aip#>'{properties,pdi,contextInformation,tags}' ?| array['SOME_TAG']

QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=149.97..149.98 rows=1 width=0) (actual time=27.783..27.783 rows=1 loops=1)
-> Bitmap Heap Scan on t_aip (cost=20.31..149.87 rows=40 width=0) (actual time=1.504..25.726 rows=20000 loops=1)
Recheck Cond: ((json_aip #> '{properties,pdi,contextInformation,tags}'::text[]) ?| '{SOME_TAG}'::text[])
Heap Blocks: exact=345
-> Bitmap Index Scan on idx_aip_tags (cost=0.00..20.30 rows=40 width=0) (actual time=1.455..1.455 rows=20000 loops=1)
Index Cond: ((json_aip #> '{properties,pdi,contextInformation,tags}'::text[]) ?| '{SOME_TAG}'::text[])

Note that the GIN index supports also @> operator:

SELECT count(*)  
FROM t_aip
WHERE json_aip#>'{properties,pdi,contextInformation,tags}' @> '["SOME_TAG"]'

but be careful when searching for more than one tag:

SELECT count(*)  
FROM t_aip
-- this gives objects containing both tags:
-- WHERE json_aip#>'{properties,pdi,contextInformation,tags}' @> '["SOME_TAG", "ANOTHER_TAG"]'
-- and this gives objects with any of two tags:
WHERE json_aip#>'{properties,pdi,contextInformation,tags}' @> ANY(ARRAY['["SOME_TAG"]', '["ANOTHER_TAG"]']::jsonb[])

Index JSON Array in Postgres DB

Please check JSONB Indexing section in Postgres documentation.
For your case index config may be the following:

CREATE INDEX idx_gin_wishlist ON your_table USING gin ((jsonb_column -> 'wish_list'));

It will store copies of every key and value inside wish_list, but you should be careful with a query which hits the index. You should use @> operator:

SELECT jsonb_column->'wish_list'
FROM your_table WHERE jsonb_column->'wish_list' @> '[{"present_link": "www.amazon.com", "present_name": "Counting Crows"}]';

Strongly suggested to check existing nswers:

  • How to query for array elements inside JSON type
  • Index for finding an element in a JSON array


Related Topics



Leave a reply



Submit