Postgres - query json with nested arrray and objects inside array
WITH data(content) AS ( VALUES
('{
"id": 1,
"external_order_id": {
"id": "2"
},
"customer": {
"external_customer_id": {
"id": "3"
}
},
"line_items": [
{
"sku": "SKU-1",
"properties": [
{
"name": "colour",
"value": "red"
},
{
"name": "size",
"value": "large"
}
],
"external_product_id": {
"id": "4"
},
"external_variant_id": {
"id": "5"
}
},
{
"sku": "SKU-2",
"properties": [
{
"name": "colour",
"value": "black"
},
{
"name": "size",
"value": "small"
}
],
"external_product_id": {
"id": "8"
},
"external_variant_id": {
"id": "9"
}
}
]
}'::jsonb)
)
select ord.*
,ext.id as external_order_id
,cus.id as external_customer_id
,line_items.sku
,line_items.external_product_id->>'id' as external_product_id
,line_items.external_variant_id->>'id' as external_variant_id
,props.*
FROM data,
jsonb_to_record(content) as ord(id int),
LATERAL jsonb_to_record(content->'external_order_id') as ext(id text),
LATERAL jsonb_to_record(content#>'{customer, external_customer_id}') as cus(id text)
CROSS JOIN LATERAL jsonb_to_recordset(content->'line_items') line_items(sku text, properties jsonb, external_product_id jsonb, external_variant_id jsonb)
cross join LATERAL jsonb_to_recordset(line_items.properties) props(name text, value text)
jsonb query with nested objects in an array
You are facing two non-trivial tasks at once.
- Process
jsonb
with a complex nested structure. - Run the equivalent of a relational division query on the document type.
First, register a row type for jsonb_populate_recordset()
. You can either create a type permanently with CREATE TYPE
, or create a temp table for ad-hoc use (dropped automatically at the end of the session):
CREATE TEMP TABLE foo(id int); -- just "id", we don't need "name"
We only need the id
, so don't include the name
. Per documentation:
JSON fields that do not appear in the target row type will be omitted from the output
Query with index support
If you need it fast, create a GIN index on the jsonb
column. The more specialized operator class jsonb_path_ops
is even faster than the default jsonb_ops
:
CREATE INDEX teams_json_gin_idx ON teams USING GIN (json jsonb_path_ops);
Can be used by the "contains" operator @>
:
SELECT t.json->>'id' AS team_id
, ARRAY (SELECT * FROM jsonb_populate_recordset(null::foo, t.json#>'{members,players}')) AS players
FROM teams t
WHERE json @> '{"members":{"players":[{"id":3},{"id":4},{"id":7}]}}';
SQL/JSON path language in Postgres 12+ can use the same index:
SELECT t.json->>'id' AS team_id
, ARRAY (SELECT * FROM jsonb_populate_recordset(null::foo, t.json#>'{members,players}')) AS players
FROM teams t
WHERE json @? '$.members ? (@.players.id == 3) ? (@.players.id == 4) ? (@.players.id == 7)';
db<>fiddle here
See:
- Find rows containing a key in a JSONB array of records
- Update all values for given key nested in JSON array of objects
Simple query
Without index support - unless you create a tailored expression index, see below.
SELECT t.json->>'id' AS team_id, p.players
FROM teams t
JOIN LATERAL (
SELECT ARRAY (
SELECT * FROM jsonb_populate_recordset(null::foo, t.json#>'{members,players}')
)
) AS p(players) ON p.players @> '{3,4,7}';
db<>fiddle here
Old sqlfiddle
How?
Extracts the JSON array with player records:
t.json#>'{members,players}'
From these, I unnest rows with just the id
with:
jsonb_populate_recordset(null::foo, t.json#>'{members,players}')
... and immediately aggregate those into a Postgres array, so we keep one row per row in the base table:
SELECT ARRAY ( ... )
All of this happens in a lateral join:
, JOIN LATERAL (SELECT ... ) AS p(players) ...
Immediately filter the resulting arrays in the join condition to keep only the ones we are looking for - with the "contains" array operator
@>
:... ON p.players @> '{3,4,7}'
If you run this query a lot on a big table, you could create a fake IMMUTABLE
function that extracts the array like above and create functional GIN index based on this function to make this super fast.
"Fake" because the function depends on the underlying row type, i.e. on a catalog lookup, and would change if that changes. (So make sure it does not change.) Similar to this one:
- Index for finding an element in a JSON array
Aside:
Don't use type names like json
as column names (even if that's allowed), that invites tricky syntax errors and confusing error messages.
Add and use index for jsonb with nested arrays
You already have a very good index to support your query.
Make use of it with the jsonb
"contains" operator" @>
:
SELECT *
FROM my_table
WHERE marc->'dynamicFields' @> '[{"name": "200", "subfields":[{"name": "a"}]}]';
db<>fiddle here
Carefully match the structure of the JSON object in the table. Then rows are selected cheaply using the index.
You can then extract whatever parts you need from qualifying rows.
Detailed instructions:
- Index for finding an element in a JSON array
If one of the filters is very selective on its own, it might be faster to split the two conditions like in your original. Either way, both variants should be fast:
SELECT *
FROM my_table
WHERE marc->'dynamicFields' @> '[{"name": "200"}]'
AND marc->'dynamicFields' @> '[{"subfields":[{"name": "a"}]}]';
SQL-Query to get nested JSON Array
Unfortunately, SQL Server does not support JSON_AGG
nor JSON_OBJECT_AGG
, which would have helped here. But we can hack it with STRING_AGG
and STRING_ESCAPE
WITH ByFirstName AS
(
SELECT
p.LastName,
p.FirstName,
json = STRING_AGG(j.json, ',')
FROM Person p
CROSS APPLY (
SELECT
p.Age,
p.Weight,
p.Sallery,
p.Married
FOR JSON PATH, WITHOUT_ARRAY_WRAPPER
) AS j(json)
GROUP BY
p.LastName,
p.FirstName
),
ByLastName AS
(
SELECT
p.LastName,
json = STRING_AGG(CONCAT(
'"',
STRING_ESCAPE(p.FirstName, 'json'),
'":[',
p.json,
']'
), ',')
FROM ByFirstName p
GROUP BY
p.LastName
)
SELECT '[{' +
STRING_AGG(CONCAT(
'"',
STRING_ESCAPE(p.LastName, 'json'),
'":{',
p.json,
'}'
), ',') + '}]'
FROM ByLastName p
db<>fiddle
This gets you
[
{
"Brown": {
"Angela": [
{
"Age": 12,
"Weight": 37,
"Sallery": 0,
"Married": false
}
],
"Chris": [
{
"Age": 48,
"Weight": 77,
"Sallery": 159000,
"Married": true
}
],
"Stepahnie": [
{
"Age": 39,
"Weight": 67,
"Sallery": 95000,
"Married": true
}
]
},
"Smith": {
"Maria": [
{
"Age": 53,
"Weight": 57,
"Sallery": 45000,
"Married": true
}
],
"Stan": [
{
"Age": 58,
"Weight": 87,
"Sallery": 59000,
"Married": true
}
]
}
}
]
jsonb LIKE query on nested objects in an array
Your solution can be simplified some more:
SELECT r.res->>'name' AS feature_name, d.name AS detail_name
FROM restaurants r
, jsonb_populate_recordset(null::foo, r.res #> '{payload, details}') d
WHERE d.name LIKE '%oh%';
Or simpler, yet, with jsonb_array_elements()
since you don't actually need the row type (foo
) at all in this example:
SELECT r.res->>'name' AS feature_name, d->>'name' AS detail_name
FROM restaurants r
, jsonb_array_elements(r.res #> '{payload, details}') d
WHERE d->>'name' LIKE '%oh%';
db<>fiddle here
But that's not what you asked exactly:
I want to return all the tuples that have this substring.
You are returning all JSON array elements (0-n per base table row), where one particular key ('{payload,details,*,name}'
) matches (case-sensitively).
And your original question had a nested JSON array on top of this. You removed the outer array for this solution - I did the same.
Depending on your actual requirements the new text search capability of Postgres 10 might be useful.
Related Topics
Equivalent Function to Stuff in SQL (Group_Concat in Myssql/Listagg in Oracle)
Sane/Fast Method to Pass Variable Parameter Lists to SQLserver2008 Stored Procedure
Is It Better to Create an Index Before Filling a Table with Data, or After the Data Is in Place
Is Varchar(Max) Always Preferable
Join to Only the "Latest" Record with T-Sql
Join Two Spreadsheets on a Common Column in Excel or Openoffice
How Is Data Stored in SQL Server
Find Which Rows Have Different Values for a Given Column in Teradata SQL
How to Create Sequence If Not Exists
Using Multiple Joins. Sum() Producing Wrong Value
Select One Row with the Max() Value on a Column
How to Convert Int to Date in SQL Server 2008
Error: Column of Relation Does Not Exist Postgresql ,Unable to Run Insert Query