Not able to explode json string in hive
Convert string to array of JSON objects first: remove square brackets, split by comma between curly braces and explode. Then use json_tuple with lateral view to extract all values. See this demo:
with mytable as (--demo table, use your table instead
select '[{"name":"john","id":12,"location":"delhi"},{"name":"raj","id":18,"location":"mumbai"},{"name":"Rahul","id":14,"location":"hyd"}]' as json_string
)
select --t.json_string as original_string, --commented
e.pos as position_in_array,
--values from json
x.name, x.id, x.location
from mytable t
lateral view outer posexplode( split(regexp_replace(json_string,'^\\[|\\]$',''), --remove []
'(?<=\\}),(?=\\{)' --split by comma only after } and before {
) --converted to array of json strings
)e as pos, json --exploded array element with position
--extract all from e.json
lateral view json_tuple(e.json,'name', 'id', 'location') x as name, id, location
Result:
position_in_array x.name x.id x.location
0 john 12 delhi
1 raj 18 mumbai
2 Rahul 14 hyd
Hive explode each JSON element in JSON array to rows
Say your json is in a table test
. You can use below query to get the desired output:
select m.* from test s
LATERAL view explode(split(regexp_replace(get_json_object(value,'$.body.metrics'),'^\\[|]$',''), ',(?!")')) m
Parse JSON Array and load into hive table
select j.Name,j.Machine
from jsonarray t
lateral view explode(split(substr(t.json,2),'(?<=\\}),(?=\\{)')) e
lateral view json_tuple(e.col,'Name','Machine') j as Name,Machine
;
+------+----------+
| name | machine |
+------+----------+
| xxxx | Machine1 |
| yyyy | Machine2 |
| zzzz | Machine3 |
+------+----------+
hive string json list to array with specific field
Convert string to JSON array: remove [], split by comma between } and {. Then extract val1 and collect_list to get an array of val1, see comments in the code:
with mytable as(--data example with single row
select '[{"key1":"val1","key2":"val2"},{"key1":"val3","key2":"val4"},{"key1":"val5","key2":"val6"}]' as json_string
)
select collect_list( --collect array
get_json_object(json_map_string,'$.key1') --key1 extracted
) as key1_array
from
(
select split(regexp_replace(json_string,'^\\[|\\]$',''), --remove []
'(?<=\\}),(?=\\{)' --split by comma only after } and before {
) as json_array --converted to array of json strings (map)
from mytable
)s
lateral view outer explode(json_array) e as json_map_string --explode array elements
;
Result:
key1_array
["val1","val3","val5"]
Hive: Extract Data From Nested JSON and Append
Try with below query:
hive> with cte as (
select stack(2,int(21),string('{"temp":"3","list":[{"url":"aaa.com"},{"url":"bbb.com"}]}'),
int(42),string('{"temp":"2","list":[{"url":"qqq.com"},{"url":"vvv.com"}]}')) as (id,json))
select id,url from (
select id,
split(
regexp_replace(
get_json_object(json,'$.list.url'),
'(\\[|\\]|\")','')
,',')jsn from cte)t
lateral view explode(jsn)asd as url;
Output:
id url
21 aaa.com
21 bbb.com
42 qqq.com
42 vvv.com
Functions Explanation:
Stack
--is used to create sample data
get_json_object
-- to extract data from json string
regexp_replace
--to replace [,],"
characters
split
--split on ,
and this will return array
explode
--use array column to explode
In Hive, how does this pattern recognize nested json from json arrays?
REGEXP '(?<=\\}),(?=\\{)'
matching comma only between } and {, not including curly brackets
(?<=\\})
is a zero-width lookbehind, asserts that what immediately precedes the current position in the string is }
(?=\\{)
is a zero-width positive lookahead assertion, means it should be { after current position
So, split function splits string to array using comma between }{, not including brackets. This results in array of these elements:
element 0 is {"a":{"c":"sss"},"w":123}
element 1 is {"b":2}
element 2 is {"r":{"c":"sss"},"w":555}]
in the mentioned answer, explode
is applied to array, it produces rows with array elements.
json_tuple
receives element (array already exploded), not array. Yes, it is extra ]
in last element, better remove it also, json_tuple recognizes element as struct, not as array, because there is no [
.
How to return array<struct> from json_tuple in hive
Unfortunately JSON_TUPLE and GET_JSON_OBJECT return strings. To convert JSON strings without using custom UDFs you can parse strings, split, explode and re-assemble structs and arrays.
Demo:
with sample_table as (
select '{
"type":"REGULAR",
"period":[
"ONCE_PER_FOUR_WEEK",
"ONCE_PER_SIX_WEEK",
"ONCE_PER_ONE_MONTH",
"ONCE_PER_TWO_MONTH",
"ONCE_PER_THREE_MONTH"
],
"count":[
"4",
"8",
"12"
],
"day":[
"SATURDAY",
"SUNDAY"
],
"content":[
{
"count":"2",
"value":5,
"unit":"PERCENT"
},
{
"count":"3",
"value":10,
"unit":"PERCENT"
}
]
}' as content
)
SELECT b.type as type,
--to convert to array<string>
--remove [" and "], split by ","
split(regexp_replace(b.period,'^\\["|"\\]',''),'","') as period,
split(regexp_replace(b.count,'^\\["|"\\]',''),'","') as count,
split(regexp_replace(b.day,'^\\["|"\\]',''),'","') as day,
--convert to struct and collect array of structs
collect_list(named_struct('count', x.count, 'value', int(x.value), 'unit', x.unit)) as content
FROM sample_table a
LATERAL VIEW JSON_TUPLE(a.content, 'type', 'period', 'count', 'day', 'content') b AS type, period, count, day, content
LATERAL VIEW explode(split(regexp_replace(b.content,'^\\[|\\]$',''), --remove []
'(?<=\\}),(?=\\{)' --split by comma only after } and before {
)) e as str_struct
LATERAL VIEW JSON_TUPLE(e.str_struct,'count','value', 'unit') x as count, value, unit
group by b.type,
b.period,
b.count,
b.day
Result:
type period count day content
REGULAR ["ONCE_PER_FOUR_WEEK","ONCE_PER_SIX_WEEK","ONCE_PER_ONE_MONTH","ONCE_PER_TWO_MONTH","ONCE_PER_THREE_MONTH"] ["4","8","12"] ["SATURDAY","SUNDAY"] [{"count":"2","value":5,"unit":"PERCENT"},{"count":"3","value":10,"unit":"PERCENT"}]
Related Topics
Using Node.Js (Express) and MySQL to Insert for a Timestamp
Mysql Convert Date from Mm-Dd-Yyyy to Yyyy-Mm-Dd Format
Laravel Update Multiple Records At Once
Sql Select Everything After a Certain Character
Addition of Total Hours in SQL Server
Sql - Sum of Positive and Negative Numbers Using Subquery
Multiple Rows Output into Variables in MySQL
Truncate Seconds and Milliseconds in SQL
Sql Server: Check If Variable Is Null and Then Assign Statement for Where Clause
Extracting Data Between Two Delimiters in SQL Server
Formatting Numbers by Padding With Leading Zeros in SQL Server
Compare 2 Column Values in Same Table
Is There a Command to Test an SQL Query Without Executing It ( MySQL or Ansi SQL )
Combining (Concatenating) Date and Time into a Datetime
Oracle SQL Hours Difference Between Dates in Hh:Mm:Ss
Postgresql Query to Return Results as a Comma Separated List
Mysql Split Comma Separated String into Temp Table
Ssis Failed Validation and Returned Validation Status "Vs_Isbroken"