Hive Explode List from Json-String

Not able to explode json string in hive

Convert string to array of JSON objects first: remove square brackets, split by comma between curly braces and explode. Then use json_tuple with lateral view to extract all values. See this demo:

with mytable as (--demo table, use your table instead
select '[{"name":"john","id":12,"location":"delhi"},{"name":"raj","id":18,"location":"mumbai"},{"name":"Rahul","id":14,"location":"hyd"}]' as json_string
)

select --t.json_string as original_string, --commented
e.pos as position_in_array,
--values from json
x.name, x.id, x.location

from mytable t
lateral view outer posexplode( split(regexp_replace(json_string,'^\\[|\\]$',''), --remove []
'(?<=\\}),(?=\\{)' --split by comma only after } and before {
) --converted to array of json strings
)e as pos, json --exploded array element with position
--extract all from e.json
lateral view json_tuple(e.json,'name', 'id', 'location') x as name, id, location

Result:

position_in_array  x.name   x.id    x.location  
0 john 12 delhi
1 raj 18 mumbai
2 Rahul 14 hyd

Hive explode each JSON element in JSON array to rows

Say your json is in a table test. You can use below query to get the desired output:

select m.* from test s
LATERAL view explode(split(regexp_replace(get_json_object(value,'$.body.metrics'),'^\\[|]$',''), ',(?!")')) m

Parse JSON Array and load into hive table

select  j.Name,j.Machine

from jsonarray t
lateral view explode(split(substr(t.json,2),'(?<=\\}),(?=\\{)')) e
lateral view json_tuple(e.col,'Name','Machine') j as Name,Machine
;

+------+----------+
| name | machine |
+------+----------+
| xxxx | Machine1 |
| yyyy | Machine2 |
| zzzz | Machine3 |
+------+----------+

hive string json list to array with specific field

Convert string to JSON array: remove [], split by comma between } and {. Then extract val1 and collect_list to get an array of val1, see comments in the code:

with mytable as(--data example with single row
select '[{"key1":"val1","key2":"val2"},{"key1":"val3","key2":"val4"},{"key1":"val5","key2":"val6"}]' as json_string
)

select collect_list( --collect array
get_json_object(json_map_string,'$.key1') --key1 extracted
) as key1_array
from
(
select split(regexp_replace(json_string,'^\\[|\\]$',''), --remove []
'(?<=\\}),(?=\\{)' --split by comma only after } and before {
) as json_array --converted to array of json strings (map)
from mytable
)s
lateral view outer explode(json_array) e as json_map_string --explode array elements
;

Result:

key1_array
["val1","val3","val5"]

Hive: Extract Data From Nested JSON and Append

Try with below query:

hive> with cte as (
select stack(2,int(21),string('{"temp":"3","list":[{"url":"aaa.com"},{"url":"bbb.com"}]}'),
int(42),string('{"temp":"2","list":[{"url":"qqq.com"},{"url":"vvv.com"}]}')) as (id,json))
select id,url from (
select id,
split(
regexp_replace(
get_json_object(json,'$.list.url'),
'(\\[|\\]|\")','')
,',')jsn from cte)t
lateral view explode(jsn)asd as url;

Output:

id      url
21 aaa.com
21 bbb.com
42 qqq.com
42 vvv.com

Functions Explanation:

Stack --is used to create sample data

get_json_object -- to extract data from json string

regexp_replace --to replace [,]," characters

split --split on , and this will return array

explode --use array column to explode

In Hive, how does this pattern recognize nested json from json arrays?

REGEXP '(?<=\\}),(?=\\{)' matching comma only between } and {, not including curly brackets

(?<=\\}) is a zero-width lookbehind, asserts that what immediately precedes the current position in the string is }

(?=\\{) is a zero-width positive lookahead assertion, means it should be { after current position

So, split function splits string to array using comma between }{, not including brackets. This results in array of these elements:

element 0 is {"a":{"c":"sss"},"w":123}

element 1 is {"b":2}

element 2 is {"r":{"c":"sss"},"w":555}]

in the mentioned answer, explode is applied to array, it produces rows with array elements.

json_tuple receives element (array already exploded), not array. Yes, it is extra ] in last element, better remove it also, json_tuple recognizes element as struct, not as array, because there is no [.

How to return array<struct> from json_tuple in hive

Unfortunately JSON_TUPLE and GET_JSON_OBJECT return strings. To convert JSON strings without using custom UDFs you can parse strings, split, explode and re-assemble structs and arrays.

Demo:

with sample_table as (
select '{
"type":"REGULAR",
"period":[
"ONCE_PER_FOUR_WEEK",
"ONCE_PER_SIX_WEEK",
"ONCE_PER_ONE_MONTH",
"ONCE_PER_TWO_MONTH",
"ONCE_PER_THREE_MONTH"
],
"count":[
"4",
"8",
"12"
],
"day":[
"SATURDAY",
"SUNDAY"
],
"content":[
{
"count":"2",
"value":5,
"unit":"PERCENT"
},
{
"count":"3",
"value":10,
"unit":"PERCENT"
}
]
}' as content
)

SELECT b.type as type,
--to convert to array<string>
--remove [" and "], split by ","
split(regexp_replace(b.period,'^\\["|"\\]',''),'","') as period,
split(regexp_replace(b.count,'^\\["|"\\]',''),'","') as count,
split(regexp_replace(b.day,'^\\["|"\\]',''),'","') as day,
--convert to struct and collect array of structs
collect_list(named_struct('count', x.count, 'value', int(x.value), 'unit', x.unit)) as content
FROM sample_table a
LATERAL VIEW JSON_TUPLE(a.content, 'type', 'period', 'count', 'day', 'content') b AS type, period, count, day, content
LATERAL VIEW explode(split(regexp_replace(b.content,'^\\[|\\]$',''), --remove []
'(?<=\\}),(?=\\{)' --split by comma only after } and before {
)) e as str_struct
LATERAL VIEW JSON_TUPLE(e.str_struct,'count','value', 'unit') x as count, value, unit
group by b.type,
b.period,
b.count,
b.day

Result:

type     period                                                                                                     count           day                     content
REGULAR ["ONCE_PER_FOUR_WEEK","ONCE_PER_SIX_WEEK","ONCE_PER_ONE_MONTH","ONCE_PER_TWO_MONTH","ONCE_PER_THREE_MONTH"] ["4","8","12"] ["SATURDAY","SUNDAY"] [{"count":"2","value":5,"unit":"PERCENT"},{"count":"3","value":10,"unit":"PERCENT"}]


Related Topics



Leave a reply



Submit