How to Query Using Fields Inside the New Postgresql Json Datatype

How do I query using fields inside the new PostgreSQL JSON datatype?

Postgres 9.2

I quote Andrew Dunstan on the pgsql-hackers list:

At some stage there will possibly be some json-processing (as opposed
to json-producing) functions, but not in 9.2.

Doesn't prevent him from providing an example implementation in PLV8 that should solve your problem. (Link is dead now, see modern PLV8 instead.)

Postgres 9.3

Offers an arsenal of new functions and operators to add "json-processing".

  • The manual on new JSON functionality.
  • The Postgres Wiki on new features in pg 9.3.

The answer to the original question in Postgres 9.3:

SELECT *
FROM json_array_elements(
'[{"name": "Toby", "occupation": "Software Engineer"},
{"name": "Zaphod", "occupation": "Galactic President"} ]'
) AS elem
WHERE elem->>'name' = 'Toby';

Advanced example:

  • Query combinations with nested array of records in JSON datatype

For bigger tables you may want to add an expression index to increase performance:

  • Index for finding an element in a JSON array

Postgres 9.4

Adds jsonb (b for "binary", values are stored as native Postgres types) and yet more functionality for both types. In addition to expression indexes mentioned above, jsonb also supports GIN, btree and hash indexes, GIN being the most potent of these.

  • The manual on json and jsonb data types and functions.
  • The Postgres Wiki on JSONB in pg 9.4

The manual goes as far as suggesting:

In general, most applications should prefer to store JSON data as
jsonb
, unless there are quite specialized needs, such as legacy
assumptions about ordering of object keys.

Bold emphasis mine.

Performance benefits from general improvements to GIN indexes.

Postgres 9.5

Complete jsonb functions and operators. Add more functions to manipulate jsonb in place and for display.

  • Major good news in the release notes of Postgres 9.5.

How do I modify fields inside the new PostgreSQL JSON datatype?

Update: With PostgreSQL 9.5, there are some jsonb manipulation functionality within PostgreSQL itself (but none for json; casts are required to manipulate json values).

Merging 2 (or more) JSON objects (or concatenating arrays):

SELECT jsonb '{"a":1}' || jsonb '{"b":2}', -- will yield jsonb '{"a":1,"b":2}'
jsonb '["a",1]' || jsonb '["b",2]' -- will yield jsonb '["a",1,"b",2]'

So, setting a simple key can be done using:

SELECT jsonb '{"a":1}' || jsonb_build_object('<key>', '<value>')

Where <key> should be string, and <value> can be whatever type to_jsonb() accepts.

For setting a value deep in a JSON hierarchy, the jsonb_set() function can be used:

SELECT jsonb_set('{"a":[null,{"b":[]}]}', '{a,1,b,0}', jsonb '{"c":3}')
-- will yield jsonb '{"a":[null,{"b":[{"c":3}]}]}'

Full parameter list of jsonb_set():

jsonb_set(target         jsonb,
path text[],
new_value jsonb,
create_missing boolean default true)

path can contain JSON array indexes too & negative integers that appear there count from the end of JSON arrays. However, a non-existing, but positive JSON array index will append the element to the end of the array:

SELECT jsonb_set('{"a":[null,{"b":[1,2]}]}', '{a,1,b,1000}', jsonb '3', true)
-- will yield jsonb '{"a":[null,{"b":[1,2,3]}]}'

For inserting into JSON array (while preserving all of the original values), the jsonb_insert() function can be used (in 9.6+; this function only, in this section):

SELECT jsonb_insert('{"a":[null,{"b":[1]}]}', '{a,1,b,0}', jsonb '2')
-- will yield jsonb '{"a":[null,{"b":[2,1]}]}', and
SELECT jsonb_insert('{"a":[null,{"b":[1]}]}', '{a,1,b,0}', jsonb '2', true)
-- will yield jsonb '{"a":[null,{"b":[1,2]}]}'

Full parameter list of jsonb_insert():

jsonb_insert(target       jsonb,
path text[],
new_value jsonb,
insert_after boolean default false)

Again, negative integers that appear in path count from the end of JSON arrays.

So, f.ex. appending to an end of a JSON array can be done with:

SELECT jsonb_insert('{"a":[null,{"b":[1,2]}]}', '{a,1,b,-1}', jsonb '3', true)
-- will yield jsonb '{"a":[null,{"b":[1,2,3]}]}', and

However, this function is working slightly differently (than jsonb_set()) when the path in target is a JSON object's key. In that case, it will only add a new key-value pair for the JSON object when the key is not used. If it's used, it will raise an error:

SELECT jsonb_insert('{"a":[null,{"b":[1]}]}', '{a,1,c}', jsonb '[2]')
-- will yield jsonb '{"a":[null,{"b":[1],"c":[2]}]}', but
SELECT jsonb_insert('{"a":[null,{"b":[1]}]}', '{a,1,b}', jsonb '[2]')
-- will raise SQLSTATE 22023 (invalid_parameter_value): cannot replace existing key

Deleting a key (or an index) from a JSON object (or, from an array) can be done with the - operator:

SELECT jsonb '{"a":1,"b":2}' - 'a', -- will yield jsonb '{"b":2}'
jsonb '["a",1,"b",2]' - 1 -- will yield jsonb '["a","b",2]'

Deleting, from deep in a JSON hierarchy can be done with the #- operator:

SELECT '{"a":[null,{"b":[3.14]}]}' #- '{a,1,b,0}'
-- will yield jsonb '{"a":[null,{"b":[]}]}'

For 9.4, you can use a modified version of the original answer (below), but instead of aggregating a JSON string, you can aggregate into a json object directly with json_object_agg().

Original answer: It is possible (without plpython or plv8) in pure SQL too (but needs 9.3+, will not work with 9.2)

CREATE OR REPLACE FUNCTION "json_object_set_key"(
"json" json,
"key_to_set" TEXT,
"value_to_set" anyelement
)
RETURNS json
LANGUAGE sql
IMMUTABLE
STRICT
AS $function$
SELECT concat('{', string_agg(to_json("key") || ':' || "value", ','), '}')::json
FROM (SELECT *
FROM json_each("json")
WHERE "key" <> "key_to_set"
UNION ALL
SELECT "key_to_set", to_json("value_to_set")) AS "fields"
$function$;

SQLFiddle

Edit:

A version, which sets multiple keys & values:

CREATE OR REPLACE FUNCTION "json_object_set_keys"(
"json" json,
"keys_to_set" TEXT[],
"values_to_set" anyarray
)
RETURNS json
LANGUAGE sql
IMMUTABLE
STRICT
AS $function$
SELECT concat('{', string_agg(to_json("key") || ':' || "value", ','), '}')::json
FROM (SELECT *
FROM json_each("json")
WHERE "key" <> ALL ("keys_to_set")
UNION ALL
SELECT DISTINCT ON ("keys_to_set"["index"])
"keys_to_set"["index"],
CASE
WHEN "values_to_set"["index"] IS NULL THEN 'null'::json
ELSE to_json("values_to_set"["index"])
END
FROM generate_subscripts("keys_to_set", 1) AS "keys"("index")
JOIN generate_subscripts("values_to_set", 1) AS "values"("index")
USING ("index")) AS "fields"
$function$;

Edit 2: as @ErwinBrandstetter noted these functions above works like a so-called UPSERT (updates a field if it exists, inserts if it does not exist). Here is a variant, which only UPDATE:

CREATE OR REPLACE FUNCTION "json_object_update_key"(
"json" json,
"key_to_set" TEXT,
"value_to_set" anyelement
)
RETURNS json
LANGUAGE sql
IMMUTABLE
STRICT
AS $function$
SELECT CASE
WHEN ("json" -> "key_to_set") IS NULL THEN "json"
ELSE (SELECT concat('{', string_agg(to_json("key") || ':' || "value", ','), '}')
FROM (SELECT *
FROM json_each("json")
WHERE "key" <> "key_to_set"
UNION ALL
SELECT "key_to_set", to_json("value_to_set")) AS "fields")::json
END
$function$;

Edit 3: Here is recursive variant, which can set (UPSERT) a leaf value (and uses the first function from this answer), located at a key-path (where keys can only refer to inner objects, inner arrays not supported):

CREATE OR REPLACE FUNCTION "json_object_set_path"(
"json" json,
"key_path" TEXT[],
"value_to_set" anyelement
)
RETURNS json
LANGUAGE sql
IMMUTABLE
STRICT
AS $function$
SELECT CASE COALESCE(array_length("key_path", 1), 0)
WHEN 0 THEN to_json("value_to_set")
WHEN 1 THEN "json_object_set_key"("json", "key_path"[l], "value_to_set")
ELSE "json_object_set_key"(
"json",
"key_path"[l],
"json_object_set_path"(
COALESCE(NULLIF(("json" -> "key_path"[l])::text, 'null'), '{}')::json,
"key_path"[l+1:u],
"value_to_set"
)
)
END
FROM array_lower("key_path", 1) l,
array_upper("key_path", 1) u
$function$;

Updated: Added function for replacing an existing json field's key by another given key. Can be in handy for updating data types in migrations or other scenarios like data structure amending.

CREATE OR REPLACE FUNCTION json_object_replace_key(
json_value json,
existing_key text,
desired_key text)
RETURNS json AS
$BODY$
SELECT COALESCE(
(
SELECT ('{' || string_agg(to_json(key) || ':' || value, ',') || '}')
FROM (
SELECT *
FROM json_each(json_value)
WHERE key <> existing_key
UNION ALL
SELECT desired_key, json_value -> existing_key
) AS "fields"
-- WHERE value IS NOT NULL (Actually not required as the string_agg with value's being null will "discard" that entry)

),
'{}'
)::json
$BODY$
LANGUAGE sql IMMUTABLE STRICT
COST 100;

Update: functions are compacted now.

How to query Json field so the keys are the column names and the values are the column fields

As the actual value is nested inside another key, the easiest way is to use the #>> operator:

select fields #>> '{Monthly_household_income, text}' as Monthly_household_income,
fields #>> '{Survey_consent, text}' as Survey_consent
fields #>> '{Are_you_working, text}' as Are_you_working
....
from the_table;

If you don't want to type that all the time, create a view that does this.

Query for array elements inside JSON type

jsonb in Postgres 9.4+

You can use the same query as below, just with jsonb_array_elements().

But rather use the jsonb "contains" operator @> in combination with a matching GIN index on the expression data->'objects':

CREATE INDEX reports_data_gin_idx ON reports
USING gin ((data->'objects') jsonb_path_ops);

SELECT * FROM reports WHERE data->'objects' @> '[{"src":"foo.png"}]';

Since the key objects holds a JSON array, we need to match the structure in the search term and wrap the array element into square brackets, too. Drop the array brackets when searching a plain record.

More explanation and options:

  • Index for finding an element in a JSON array

json in Postgres 9.3+

Unnest the JSON array with the function json_array_elements() in a lateral join in the FROM clause and test for its elements:

SELECT data::text, obj
FROM reports r, json_array_elements(r.data#>'{objects}') obj
WHERE obj->>'src' = 'foo.png';

db<>fiddle here

Old sqlfiddle

Or, equivalent for just a single level of nesting:

SELECT *
FROM reports r, json_array_elements(r.data->'objects') obj
WHERE obj->>'src' = 'foo.png';

->>, -> and #> operators are explained in the manual.

Both queries use an implicit JOIN LATERAL.

Closely related:

  • Query for element of array in JSON column

Get data type of JSON field in Postgres

I ended up getting access to PLv8 in my environment, which made this easy:

CREATE FUNCTION value_type(fields JSON) RETURNS TEXT AS $$
return typeof fields.value;
$$ LANGUAGE plv8;

As mentioned in the comments, there will be a native function for this in 9.4.

Query on json / jsonb column super slow. Can I use an index?

Like a_horse already advised (and you mentioned yourself), the proper fix is to extract those attributes to separate columns, normalizing your design to some extent.

Can an index help?

Sadly, no (as of Postgres 14).

It could work in theory. Since your values are big, an expression index with just some small attributes can be picked up by Postgres in an index-only scan, even when retrieving all rows (where it otherwise would ignore indexes).

The manual:

However, PostgreSQL's planner is currently not very smart about such cases. It considers a query to be potentially executable by index-only scan only when all columns needed by the query are available from the index.

So you would have to include value itself in the index, even if just as INCLUDE column - totally spoiling the whole idea. No go.

You probably can still do something in the short term. Two crucial quotes:

I am looking to see if I can make this faster in short term

The json blobs are a bit large

Data type

Drop the cast to json from the query. Casting every time adds pointless cost.

Compression

One major cost factor will be compression. Postgres has to "de-toast" the whole large column, just to extract some small attributes. Since Postgres 14, you can switch the compression algorithm (if support is enabled in your version!). The default is dictated by the config setting default_toast_compression, which is set to pglz by default. Currently the only available alternative is lz4. You can set that per column. Any time.

LZ4 (lz4) is considerably faster, while compressing typically a bit less. About twice as fast, but around 10 % more storage (depends!). If performance is not an issue it's best to stick to the stronger compression of the default LZ algorithm (pglz). There may be more compression algorithms to pick from in the future.

To implement:

ALTER TABLE data
ALTER COLUMN value SET COMPRESSION lz4;

Setting a new COMPRESSION for a column does not re-compress automatically. Postgres remembers the compression method and only re-compresses if it's forced to un-compress anyway. You may want to force re-compression of existing values. You can check with:

SELECT pg_column_compression(value) FROM data LIMIT 10;

Related blog post:

  • https://www.postgresql.fastware.com/blog/what-is-the-new-lz4-toast-compression-in-postgresql-14

GENERATED columns

While stuck with the broken design you might just add some (small!) generated columns to cover your query:

ALTER TABLE data
ADD COLUMN name text GENERATED ALWAYS AS (value::json ->> 'name') STORED
, ADD COLUMN mnemonic text GENERATED ALWAYS AS (value::json ->> 'mnemonic') STORED
...

And then target only those generated columns, not involving the big value at all.

SELECT name, mnemonic, ... FROM data;

That would bypass the main performance issue.

See:

  • Computed / calculated / virtual / derived columns in PostgreSQL

However, you mentioned:

It's the data that is further down in the json blob that often changes.

Every change to value forces a re-check on the generated columns, so that adds write cost.

Postgres - query json with nested arrray and objects inside array

WITH data(content) AS ( VALUES
('{
"id": 1,
"external_order_id": {
"id": "2"
},
"customer": {

"external_customer_id": {
"id": "3"
}
},
"line_items": [
{
"sku": "SKU-1",
"properties": [
{
"name": "colour",
"value": "red"
},
{
"name": "size",
"value": "large"
}
],
"external_product_id": {
"id": "4"
},
"external_variant_id": {
"id": "5"
}
},
{
"sku": "SKU-2",
"properties": [
{
"name": "colour",
"value": "black"
},
{
"name": "size",
"value": "small"
}
],
"external_product_id": {
"id": "8"
},
"external_variant_id": {
"id": "9"
}
}
]

}'::jsonb)
)
select ord.*
,ext.id as external_order_id
,cus.id as external_customer_id
,line_items.sku
,line_items.external_product_id->>'id' as external_product_id
,line_items.external_variant_id->>'id' as external_variant_id
,props.*
FROM data,
jsonb_to_record(content) as ord(id int),
LATERAL jsonb_to_record(content->'external_order_id') as ext(id text),
LATERAL jsonb_to_record(content#>'{customer, external_customer_id}') as cus(id text)
CROSS JOIN LATERAL jsonb_to_recordset(content->'line_items') line_items(sku text, properties jsonb, external_product_id jsonb, external_variant_id jsonb)
cross join LATERAL jsonb_to_recordset(line_items.properties) props(name text, value text)

How do I search for a specific string in a JSON Postgres data type column?

In Postgres 11 or earlier it is possible to recursively walk through an unknown json structure, but it would be rather complex and costly. I would propose the brute force method which should work well:

select *
from reports
where params::text like '%authVar%';
-- or
-- where params::text like '%"authVar"%';
-- if you are looking for the exact value

The query is very fast but may return unexpected extra rows in cases when the searched string is a part of one of the keys.

In Postgres 12+ the recursive searching in JSONB is pretty comfortable with the new feature of jsonpath.

Find a string value containing authVar:

select *
from reports
where jsonb_path_exists(params, '$.** ? (@.type() == "string" && @ like_regex "authVar")')

The jsonpath:

$.**                     find any value at any level (recursive processing)
? where
@.type() == "string" value is string
&& and
@ like_regex "authVar" value contains 'authVar'

Or find the exact value:

select *
from reports
where jsonb_path_exists(params, '$.** ? (@ == "authVar")')

Read in the documentation:

  • The SQL/JSON Path Language
  • jsonpath Type

How to query array in Postgres json column?

Updated for change column type from text[] to json

If your column type is JSON you can use two scenarios:

Demo

  1. convert to jsonb and use ? operator (Postgres document)
select * from test where content_type::jsonb ? 'c';

  1. Use json_array_elements_text
select distinct on(t.id) t.*
from
test t
cross join json_array_elements_text(content_type) je
where
je.value = 'c';

Old scenario

You can use any function to check values exist in an array or according to Postgres document use array operator @>

Demo

  1. With any
select * from test where 'c' = any(content_type);

  1. With @>
select * from test where content_type @> array['c'];


Related Topics



Leave a reply



Submit