What's the Proper Index for Querying Structures in Arrays in Postgres JSONb

What's the proper index for querying structures in arrays in Postgres jsonb?

First of all, you cannot access JSON array values like that. For a given json value:

[{"event_slug":"test_1","start_time":"2014-10-08","end_time":"2014-10-12"},
{"event_slug":"test_2","start_time":"2013-06-24","end_time":"2013-07-02"},
{"event_slug":"test_3","start_time":"2014-03-26","end_time":"2014-03-30"}]

A valid test against the first array element would be:

WHERE e->0->>'event_slug' = 'test_1'

But you probably don't want to limit your search to the first element of the array. With the jsonb data type in Postgres 9.4 you have additional operators and index support. To index elements of an array you need a GIN index.

The built-in operator classes for GIN indexes do not support "greater than" or "less than" operators > >= < <=. This is true for jsonb as well, where you can choose between two operator classes. The manual:

Name             Indexed Data Type  Indexable Operators
...
jsonb_ops jsonb ? ?& ?| @>
jsonb_path_ops jsonb @>

(jsonb_ops being the default.) You can cover the equality test, but neither of those operators covers your requirement for >= comparison. You would need a btree index.

Basic solution

To support the equality check with an index:

CREATE INDEX locations_events_gin_idx ON locations
USING gin (events jsonb_path_ops);

SELECT * FROM locations WHERE events @> '[{"event_slug":"test_1"}]';

This might be good enough if the filter is selective enough.

Assuming end_time >= start_time, so we don't need two checks. Checking only end_time is cheaper and equivalent:

SELECT l.*
FROM locations l
, jsonb_array_elements(l.events) e
WHERE l.events @> '[{"event_slug":"test_1"}]'
AND (e->>'end_time')::timestamp >= '2014-10-30 14:04:06 -0400'::timestamptz;

Utilizing an implicit JOIN LATERAL. Details (last chapter):

  • PostgreSQL unnest() with element number

Careful with the different data types! What you have in the JSON value looks like timestamp [without time zone], while your predicates use timestamp with time zone literals. The timestamp value is interpreted according to the current time zone setting, while the given timestamptz literals must be cast to timestamptz explicitly or the time zone would be ignored! Above query should work as desired. Detailed explanation:

  • Ignoring time zones altogether in Rails and PostgreSQL

More explanation for jsonb_array_elements():

  • PostgreSQL joining using JSONB

Advanced solution

If the above is not good enough, I would consider a MATERIALIZED VIEW that stores relevant attributes in normalized form. This allows plain btree indexes.

The code assumes that your JSON values have a consistent format as displayed in the question.

Setup:

CREATE TYPE event_type AS (
, event_slug text
, start_time timestamp
, end_time timestamp
);

CREATE MATERIALIZED VIEW loc_event AS
SELECT l.location_id, e.event_slug, e.end_time -- start_time not needed
FROM locations l, jsonb_populate_recordset(null::event_type, l.events) e;

Related answer for jsonb_populate_recordset():

  • How to convert PostgreSQL 9.4's jsonb type to float
CREATE INDEX loc_event_idx ON loc_event (event_slug, end_time, location_id);

Also including location_id to allow index-only scans. (See manual page and Postgres Wiki.)

Query:

SELECT *
FROM loc_event
WHERE event_slug = 'test_1'
AND end_time >= '2014-10-30 14:04:06 -0400'::timestamptz;

Or, if you need full rows from the underlying locations table:

SELECT l.*
FROM (
SELECT DISTINCT location_id
FROM loc_event
WHERE event_slug = 'test_1'
AND end_time >= '2014-10-30 14:04:06 -0400'::timestamptz
) le
JOIN locations l USING (location_id);

Index for finding an element in a JSON array


jsonb in Postgres 9.4+

The binary JSON data type jsonb largely improves index options. You can now have a GIN index on a jsonb array directly:

CREATE TABLE tracks (id serial, artists jsonb);  -- !
CREATE INDEX tracks_artists_gin_idx ON tracks USING gin (artists);

No need for a function to convert the array. This would support a query:

SELECT * FROM tracks WHERE artists @> '[{"name": "The Dirty Heads"}]';

@> being the jsonb "contains" operator, which can use the GIN index. (Not for json, only jsonb!)

Or you use the more specialized, non-default GIN operator class jsonb_path_ops for the index:

CREATE INDEX tracks_artists_gin_idx ON tracks
USING gin (artists jsonb_path_ops); -- !

Same query.

Currently jsonb_path_ops only supports the @> operator. But it's typically much smaller and faster. There are more index options, details in the manual.


If the column artists only holds names as displayed in the example, it would be more efficient to store just the values as JSON text primitives and the redundant key can be the column name.

Note the difference between JSON objects and primitive types:

  • Using indexes in json array in PostgreSQL
CREATE TABLE tracks (id serial, artistnames jsonb);
INSERT INTO tracks VALUES (2, '["The Dirty Heads", "Louis Richards"]');

CREATE INDEX tracks_artistnames_gin_idx ON tracks USING gin (artistnames);

Query:

SELECT * FROM tracks WHERE artistnames ? 'The Dirty Heads';

? does not work for object values, just keys and array elements.

Or:

CREATE INDEX tracks_artistnames_gin_idx ON tracks
USING gin (artistnames jsonb_path_ops);

Query:

SELECT * FROM tracks WHERE artistnames @> '"The Dirty Heads"'::jsonb;

More efficient if names are highly duplicative.

json in Postgres 9.3+

This should work with an IMMUTABLE function:

CREATE OR REPLACE FUNCTION json2arr(_j json, _key text)
RETURNS text[] LANGUAGE sql IMMUTABLE AS
'SELECT ARRAY(SELECT elem->>_key FROM json_array_elements(_j) elem)';

Create this functional index:

CREATE INDEX tracks_artists_gin_idx ON tracks
USING gin (json2arr(artists, 'name'));

And use a query like this. The expression in the WHERE clause has to match the one in the index:

SELECT * FROM tracks
WHERE '{"The Dirty Heads"}'::text[] <@ (json2arr(artists, 'name'));

Updated with feedback in comments. We need to use array operators to support the GIN index.

The "is contained by" operator <@ in this case.

Notes on function volatility

You can declare your function IMMUTABLE even if json_array_elements() isn't wasn't.

Most JSON functions used to be only STABLE, not IMMUTABLE. There was a discussion on the hackers list to change that. Most are IMMUTABLE now. Check with:

SELECT p.proname, p.provolatile
FROM pg_proc p
JOIN pg_namespace n ON n.oid = p.pronamespace
WHERE n.nspname = 'pg_catalog'
AND p.proname ~~* '%json%';

Functional indexes only work with IMMUTABLE functions.

Add and use index for jsonb with nested arrays

You already have a very good index to support your query.

Make use of it with the jsonb "contains" operator" @>:

SELECT *
FROM my_table
WHERE marc->'dynamicFields' @> '[{"name": "200", "subfields":[{"name": "a"}]}]';

db<>fiddle here

Carefully match the structure of the JSON object in the table. Then rows are selected cheaply using the index.

You can then extract whatever parts you need from qualifying rows.

Detailed instructions:

  • Index for finding an element in a JSON array

If one of the filters is very selective on its own, it might be faster to split the two conditions like in your original. Either way, both variants should be fast:

SELECT *
FROM my_table
WHERE marc->'dynamicFields' @> '[{"name": "200"}]'
AND marc->'dynamicFields' @> '[{"subfields":[{"name": "a"}]}]';

postgresql jsonb array query using gin index

As soon as you unnest the JSON, you can't use an index anymore. I think you are looking for:

select jsonb_array_elements(p.cities) item
from place p
where p.cities @> '[{"name": "paris"}]'
or p.cities @> '[{"name": "dakar"}]'

Or more explicitly:

select e.item
from place p
cross join jsonb_array_elements(p.cities) as e(item)
where p.cities @> '[{"name": "paris"}]'
or p.cities @> '[{"name": "dakar"}]'

The GIN index will only be used if you a lot of rows in the table (not elements in the array!) and your WHERE condition reduces the many rows to just some.

If you store everything in one giant JSON value in a single row, no index will improve your query. You should normalize your data model instead - which seems quite easy to do as you don't seem to have any dynamic structure and all elements share the same keys (=column).

Index JSON Array in Postgres DB

Please check JSONB Indexing section in Postgres documentation.
For your case index config may be the following:

CREATE INDEX idx_gin_wishlist ON your_table USING gin ((jsonb_column -> 'wish_list'));

It will store copies of every key and value inside wish_list, but you should be careful with a query which hits the index. You should use @> operator:

SELECT jsonb_column->'wish_list'
FROM your_table WHERE jsonb_column->'wish_list' @> '[{"present_link": "www.amazon.com", "present_name": "Counting Crows"}]';

Strongly suggested to check existing nswers:

  • How to query for array elements inside JSON type
  • Index for finding an element in a JSON array

Using indexes in json array in PostgreSQL

This specific jsonb example from the original answer was missing the array layer [] around the non-primitive object for the containment query. It has since been fixed.

The behavior documented for PostgreSQL 9.4.x jsonb Containment and Existence states:

The general principle is that the contained object must match the containing object as to structure and data contents

...

As a special exception to the general principle that the structures must match, an array may contain a primitive value

The special exception allows us to do the following:

CREATE TABLE tracks (id serial, artistnames jsonb);
CREATE INDEX tracks_artistnames_gin_idx ON tracks USING gin (artistnames);
INSERT INTO tracks (id, artists) VALUES (1, '["blink-182"]');
INSERT INTO tracks (id, artists) VALUES (2, '["The Dirty Heads", "Louis Richards"]');

We can query for containment using the general principle:

SELECT * FROM tracks WHERE artistnames @> '["The Dirty Heads"]';
id | artistnames
----+---------------------------------------
2 | ["The Dirty Heads", "Louis Richards"]
(1 row)

We can also query for containment using the special exception since the array contains primitive types:

SELECT * FROM tracks WHERE artistnames @> '"The Dirty Heads"';
id | artistnames
----+---------------------------------------
2 | ["The Dirty Heads", "Louis Richards"]
(1 row)

There are 4 primitive types that allow containment and existence queries on arrays to work:

  1. string
  2. number
  3. boolean
  4. null

Since the example you mentioned in your question is dealing with objects nested inside an array, we don't qualify for the special exception mentioned above:

CREATE TABLE tracks (id serial, artists jsonb);
CREATE INDEX tracks_artists_gin_idx ON tracks USING gin (artists);
INSERT INTO tracks (id, artists) VALUES (1, '[{"name": "blink-182"}]');
INSERT INTO tracks (id, artists) VALUES (2, '[{"name": "The Dirty Heads"}, {"name": "Louis Richards"}]');

We can query for containment using the general principle:

SELECT * FROM tracks WHERE artists @> '[{"name": "The Dirty Heads"}]';
id | artists
----+-----------------------------------------------------------
2 | [{"name": "The Dirty Heads"}, {"name": "Louis Richards"}]
(1 row)

Objects are not considered a primitive type, so the following query for containment does not qualify for the special exception and therefore does not work:

SELECT * FROM tracks WHERE artists @> '{"name": "The Dirty Heads"}';
id | artists
----+---------
(0 rows)

Create Postgres JSONB Index on Array Sub-Object

If you don't need the lower() in there, the query can be simple and efficient:

SELECT *
FROM mytable
WHERE myjsonb -> 'myArray' @> '[{"subItem": {"email": "foo@foo.com"}}]'

Supported by a jsonb_path_ops index:

CREATE INDEX mytable_myjsonb_gin_idx ON mytable
USING gin ((myjsonb -> 'myArray') jsonb_path_ops);

But the match is case-sensitive.

Case-insensitive!

If you need the search to match disregarding case, things get more complex.

You could use this query, similar to your original:

SELECT *
FROM t
WHERE EXISTS (
SELECT 1
FROM jsonb_array_elements(myjsonb -> 'myArray') arr
WHERE lower(arr #>>'{subItem, email}') = 'foo@foo.com'
);

But I can't think of a good way to use an index for this.

Instead, I would use an expression index based on a function extracting an array of lower-case emails:

Function:

CREATE OR REPLACE FUNCTION f_jsonb_arr_lower(_j jsonb, VARIADIC _path text[])
RETURNS jsonb LANGUAGE sql IMMUTABLE AS
'SELECT jsonb_agg(lower(elem #>> _path)) FROM jsonb_array_elements(_j) elem';

Index:

CREATE INDEX mytable_email_arr_idx ON mytable
USING gin (f_jsonb_arr_lower(myjsonb -> 'myArray', 'subItem', 'email') jsonb_path_ops);

Query:

SELECT *
FROM mytable
WHERE f_jsonb_arr_lower(myjsonb -> 'myArray', 'subItem', 'email') @> '"foo@foo.com"';

While this works with an untyped string literal or with actual jsonb values, it stops working if you pass text or varchar (like in a prepared statement). Postgres does not know how to cast because the input is ambiguous. You need an explicit cast in this case:

... @> '"foo@foo.com"'::text::jsonb;

Or pass a simple string without enclosing double quotes and do the conversion to jsonb in Postgres:

... @> to_jsonb('foo@foo.com'::text);

Related, with more explanation:

  • Query for array elements inside JSON type
  • Index for finding an element in a JSON array

PostgreSQL View on a JSONB field object _array_ (with indexing)

A GIN index on the jsonb column will support several JSON operators. One of them is the @> operator which also works with JSON arrays.

The following index:

create index on product_list using gin (product_ids);

A query that can potentially make use of the above index would look like this:

select *
from product_list
where product_ids @> '[{"productId": 42}]'::jsonb;

There is no way your suggested view can make use of an index as the JSONB column is not part of the view and that would be necessary to be able to push down the condition on the JSON column.

The only way you can make use of an index in the view is to include the JSON column in it:

create view normalized_list
as
SELECT pl.id AS list_id,
t.product_id,
pl.product_ids
FROM product_list pl
CROSS JOIN jsonb_array_elements(pl.product_ids) ->> 'id' AS t(product_id)
;

A query like this:

select id, product_Id
from normalized_list
where product_ids @> '[{"id":42}]'::jsonb;

would make use of the GIN index.


Note that if you only want to store IDs in a denormalized way, a native integer array (product_ids int[]) would be more efficient and would make the table substantially smaller



Related Topics



Leave a reply



Submit