JSONb Query with Nested Objects in an Array

Postgres - query json with nested arrray and objects inside array

WITH data(content) AS ( VALUES
('{
"id": 1,
"external_order_id": {
"id": "2"
},
"customer": {

"external_customer_id": {
"id": "3"
}
},
"line_items": [
{
"sku": "SKU-1",
"properties": [
{
"name": "colour",
"value": "red"
},
{
"name": "size",
"value": "large"
}
],
"external_product_id": {
"id": "4"
},
"external_variant_id": {
"id": "5"
}
},
{
"sku": "SKU-2",
"properties": [
{
"name": "colour",
"value": "black"
},
{
"name": "size",
"value": "small"
}
],
"external_product_id": {
"id": "8"
},
"external_variant_id": {
"id": "9"
}
}
]

}'::jsonb)
)
select ord.*
,ext.id as external_order_id
,cus.id as external_customer_id
,line_items.sku
,line_items.external_product_id->>'id' as external_product_id
,line_items.external_variant_id->>'id' as external_variant_id
,props.*
FROM data,
jsonb_to_record(content) as ord(id int),
LATERAL jsonb_to_record(content->'external_order_id') as ext(id text),
LATERAL jsonb_to_record(content#>'{customer, external_customer_id}') as cus(id text)
CROSS JOIN LATERAL jsonb_to_recordset(content->'line_items') line_items(sku text, properties jsonb, external_product_id jsonb, external_variant_id jsonb)
cross join LATERAL jsonb_to_recordset(line_items.properties) props(name text, value text)

jsonb query with nested objects in an array

You are facing two non-trivial tasks at once.

  • Process jsonb with a complex nested structure.
  • Run the equivalent of a relational division query on the document type.

First, register a row type for jsonb_populate_recordset(). You can either create a type permanently with CREATE TYPE, or create a temp table for ad-hoc use (dropped automatically at the end of the session):

CREATE TEMP TABLE foo(id int);  -- just "id", we don't need "name"

We only need the id, so don't include the name. Per documentation:

JSON fields that do not appear in the target row type will be omitted from the output

Query with index support

If you need it fast, create a GIN index on the jsonb column. The more specialized operator class jsonb_path_ops is even faster than the default jsonb_ops:

CREATE INDEX teams_json_gin_idx ON teams USING GIN (json jsonb_path_ops);

Can be used by the "contains" operator @>:

SELECT t.json->>'id' AS team_id
, ARRAY (SELECT * FROM jsonb_populate_recordset(null::foo, t.json#>'{members,players}')) AS players
FROM teams t
WHERE json @> '{"members":{"players":[{"id":3},{"id":4},{"id":7}]}}';

SQL/JSON path language in Postgres 12+ can use the same index:

SELECT t.json->>'id' AS team_id
, ARRAY (SELECT * FROM jsonb_populate_recordset(null::foo, t.json#>'{members,players}')) AS players
FROM teams t
WHERE json @? '$.members ? (@.players.id == 3) ? (@.players.id == 4) ? (@.players.id == 7)';

db<>fiddle here

See:

  • Find rows containing a key in a JSONB array of records
  • Update all values for given key nested in JSON array of objects

Simple query

Without index support - unless you create a tailored expression index, see below.

SELECT t.json->>'id' AS team_id, p.players
FROM teams t
JOIN LATERAL (
SELECT ARRAY (
SELECT * FROM jsonb_populate_recordset(null::foo, t.json#>'{members,players}')
)
) AS p(players) ON p.players @> '{3,4,7}';

db<>fiddle here

Old sqlfiddle

How?

Extracts the JSON array with player records:

t.json#>'{members,players}'

From these, I unnest rows with just the id with:

jsonb_populate_recordset(null::foo, t.json#>'{members,players}')

... and immediately aggregate those into a Postgres array, so we keep one row per row in the base table:

SELECT ARRAY ( ... )

All of this happens in a lateral join:

, JOIN LATERAL (SELECT ... ) AS p(players) ...
  • Immediately filter the resulting arrays in the join condition to keep only the ones we are looking for - with the "contains" array operator @>:

    ... ON p.players @> '{3,4,7}'

If you run this query a lot on a big table, you could create a fake IMMUTABLE function that extracts the array like above and create functional GIN index based on this function to make this super fast.

"Fake" because the function depends on the underlying row type, i.e. on a catalog lookup, and would change if that changes. (So make sure it does not change.) Similar to this one:

  • Index for finding an element in a JSON array

Aside:

Don't use type names like json as column names (even if that's allowed), that invites tricky syntax errors and confusing error messages.

Add and use index for jsonb with nested arrays

You already have a very good index to support your query.

Make use of it with the jsonb "contains" operator" @>:

SELECT *
FROM my_table
WHERE marc->'dynamicFields' @> '[{"name": "200", "subfields":[{"name": "a"}]}]';

db<>fiddle here

Carefully match the structure of the JSON object in the table. Then rows are selected cheaply using the index.

You can then extract whatever parts you need from qualifying rows.

Detailed instructions:

  • Index for finding an element in a JSON array

If one of the filters is very selective on its own, it might be faster to split the two conditions like in your original. Either way, both variants should be fast:

SELECT *
FROM my_table
WHERE marc->'dynamicFields' @> '[{"name": "200"}]'
AND marc->'dynamicFields' @> '[{"subfields":[{"name": "a"}]}]';

SQL-Query to get nested JSON Array

Unfortunately, SQL Server does not support JSON_AGG nor JSON_OBJECT_AGG, which would have helped here. But we can hack it with STRING_AGG and STRING_ESCAPE

WITH ByFirstName AS
(
SELECT
p.LastName,
p.FirstName,
json = STRING_AGG(j.json, ',')
FROM Person p
CROSS APPLY (
SELECT
p.Age,
p.Weight,
p.Sallery,
p.Married
FOR JSON PATH, WITHOUT_ARRAY_WRAPPER
) AS j(json)
GROUP BY
p.LastName,
p.FirstName
),
ByLastName AS
(
SELECT
p.LastName,
json = STRING_AGG(CONCAT(
'"',
STRING_ESCAPE(p.FirstName, 'json'),
'":[',
p.json,
']'
), ',')
FROM ByFirstName p
GROUP BY
p.LastName
)
SELECT '[{' +
STRING_AGG(CONCAT(
'"',
STRING_ESCAPE(p.LastName, 'json'),
'":{',
p.json,
'}'
), ',') + '}]'
FROM ByLastName p

db<>fiddle

This gets you

[
{
"Brown": {
"Angela": [
{
"Age": 12,
"Weight": 37,
"Sallery": 0,
"Married": false
}
],
"Chris": [
{
"Age": 48,
"Weight": 77,
"Sallery": 159000,
"Married": true
}
],
"Stepahnie": [
{
"Age": 39,
"Weight": 67,
"Sallery": 95000,
"Married": true
}
]
},
"Smith": {
"Maria": [
{
"Age": 53,
"Weight": 57,
"Sallery": 45000,
"Married": true
}
],
"Stan": [
{
"Age": 58,
"Weight": 87,
"Sallery": 59000,
"Married": true
}
]
}
}
]

jsonb LIKE query on nested objects in an array

Your solution can be simplified some more:

SELECT r.res->>'name' AS feature_name, d.name AS detail_name
FROM restaurants r
, jsonb_populate_recordset(null::foo, r.res #> '{payload, details}') d
WHERE d.name LIKE '%oh%';

Or simpler, yet, with jsonb_array_elements() since you don't actually need the row type (foo) at all in this example:

SELECT r.res->>'name' AS feature_name, d->>'name' AS detail_name
FROM restaurants r
, jsonb_array_elements(r.res #> '{payload, details}') d
WHERE d->>'name' LIKE '%oh%';

db<>fiddle here

But that's not what you asked exactly:

I want to return all the tuples that have this substring.

You are returning all JSON array elements (0-n per base table row), where one particular key ('{payload,details,*,name}') matches (case-sensitively).

And your original question had a nested JSON array on top of this. You removed the outer array for this solution - I did the same.

Depending on your actual requirements the new text search capability of Postgres 10 might be useful.



Related Topics



Leave a reply



Submit