Hive select data into an array of structs
I would use this jar, it is a much better implementation of collect
(and takes complex datatypes).
Query:
add jar /path/to/jar/brickhouse-0.7.1.jar;
create temporary function collect as 'brickhouse.udf.collect.CollectUDAF';
select house_id
, collect(named_struct("first_name", first_name, "last_name", last_name))
from db.table
group by house_id
Output:
1 [{"first_name":"bob","last_name":"jones"}, {"first_name":"jenny","last_name":"jones"}]
2 [{"first_name":"sally","last_name":"johnson"}]
3 [{"first_name":"john","last_name":"smith"},{"first_name":"barb","last_name":"smith"}]
Hive SELECT statement to create an ARRAY of STRUCTS
The functionality that you seem to be looking for is to collect the structs into an array. Hive comes with two functions for collecting things into arrays: collect_set and collect_list. However, those functions only work to create arrays of basic types.
The jar for the brickhouse project (https://github.com/klout/brickhouse/wiki/Downloads) provides a number of features, including the ability to collect complex types.
add jar hdfs://path/to/your/jars/brickhouse-0.6.0.jar
Then you can add the collect
function using whatever name you like:
create temporary function collect_struct as 'brickhouse.udf.collect.CollectUDAF';
The following query:
select id
, collect_struct(
named_struct(
"field_id", fieldid,
"field_label", fieldlabel,
"field_type", fieldtype,
"answer_id", answer_id)) as answers
, unitname
from new_answers
group by id, unitname
;
Provides the following result:
id answers unitname
1 [{"field_id":175877,"field_label":"Comment","field_type":"COMMENT","answer_id":8990947803}] Location1
2 [{"field_id":47824,"field_label":"Language","field_type":"MULTIPLE_CHOICE","answer_id":8990950069},{"field_id":48187,"field_label":"Language Type","field_type":"MULTIPLE_CHOICE","answer_id":8990950070},{"field_id":47829,"field_label":"Trans #","field_type":"TEXT","answer_id":8990950071}] Location2
How to select an array of specific struct values in hive sql?
You don't need to use explode
function. You can merely call your desired field directly on your array column:
SELECT myColumn.color FROM myTable
Hive Explode the Array of Struct key: value:
Use laterral view [outer] inline
to get struct elements already etracted and use conditional aggregation to get values corresponting to some keys grouped in single row, use group_by user_id.
Demo:
with sample_table as (--This is your data example
select '11111' USER_ID,
array(named_struct('key','client_status','value','ACTIVE'),named_struct('key','name','value','Jane Doe')) DETAIL_DATA
)
SELECT max(case when e.key='name' then e.value end) as name,
max(case when e.key='client_status' then e.value end) as status
FROM sample_table
lateral view inline(DETAIL_DATA) e as key, value
group by USER_ID
Result:
name status
------------------------
Jane Doe ACTIVE
If you can guarantee the order of structs in array (one with status comes first always), you can address nested elements dirctly
SELECT detail_data[0].value as client_status,
detail_data[1].value as name
from sample_table
One more approach, if you do not know the order in array, but array is of size=2, CASE expressions without explode will give better performance:
SELECT case when DETAIL_DATA[0].key='name' then DETAIL_DATA[0].value else DETAIL_DATA[1].value end as name,
case when DETAIL_DATA[0].key='client_status' then DETAIL_DATA[0].value else DETAIL_DATA[1].value end as status
FROM sample_table
Related Topics
Mysql Query - Records Between Today and Last 30 Days
Concatenate With String But Exclude When Null
How to Split a Comma-Separated Value to Columns
How to Insert 1000 Rows At a Time
How to Specify a Password to 'Psql' Non-Interactively
Calculate Total Working Hour of Employee in SQL With Only 1 Column
Subtraction Between Two SQL Queries
Sequelize Connect Etimeout When Connecting to Remote MySQL Db
Replace Null Value With Previous Available Value in Row SQL Server Query
How to Calculate Difference of Values in a Successive Rows
Hive Explode List from Json-String
How to Display the Value of Avg Function Till Only Two Decimal Places in SQL
Convert the Students Marks into Respective Grades
Select Distinct Values from One Table and Join With Another Table
Selecting a Single Record from the Same Multiple Rows from a Single Table
Multiple Rows to One Comma-Separated Value Different Id
Loop Through Table Rows and Call Stored Procedure on Every Row