Hive Select Data into an Array of Structs

Hive select data into an array of structs

I would use this jar, it is a much better implementation of collect (and takes complex datatypes).

Query:

add jar /path/to/jar/brickhouse-0.7.1.jar;
create temporary function collect as 'brickhouse.udf.collect.CollectUDAF';

select house_id
, collect(named_struct("first_name", first_name, "last_name", last_name))
from db.table
group by house_id

Output:

1   [{"first_name":"bob","last_name":"jones"}, {"first_name":"jenny","last_name":"jones"}]
2 [{"first_name":"sally","last_name":"johnson"}]
3 [{"first_name":"john","last_name":"smith"},{"first_name":"barb","last_name":"smith"}]

Hive SELECT statement to create an ARRAY of STRUCTS

The functionality that you seem to be looking for is to collect the structs into an array. Hive comes with two functions for collecting things into arrays: collect_set and collect_list. However, those functions only work to create arrays of basic types.

The jar for the brickhouse project (https://github.com/klout/brickhouse/wiki/Downloads) provides a number of features, including the ability to collect complex types.

add jar hdfs://path/to/your/jars/brickhouse-0.6.0.jar

Then you can add the collect function using whatever name you like:

create temporary function collect_struct as 'brickhouse.udf.collect.CollectUDAF';

The following query:

select id
, collect_struct(
named_struct(
"field_id", fieldid,
"field_label", fieldlabel,
"field_type", fieldtype,
"answer_id", answer_id)) as answers
, unitname
from new_answers
group by id, unitname
;

Provides the following result:

id  answers unitname
1 [{"field_id":175877,"field_label":"Comment","field_type":"COMMENT","answer_id":8990947803}] Location1
2 [{"field_id":47824,"field_label":"Language","field_type":"MULTIPLE_CHOICE","answer_id":8990950069},{"field_id":48187,"field_label":"Language Type","field_type":"MULTIPLE_CHOICE","answer_id":8990950070},{"field_id":47829,"field_label":"Trans #","field_type":"TEXT","answer_id":8990950071}] Location2

How to select an array of specific struct values in hive sql?

You don't need to use explode function. You can merely call your desired field directly on your array column:

SELECT myColumn.color FROM myTable

Hive Explode the Array of Struct key: value:

Use laterral view [outer] inline to get struct elements already etracted and use conditional aggregation to get values corresponting to some keys grouped in single row, use group_by user_id.

Demo:

with sample_table as (--This is your data example
select '11111' USER_ID,
array(named_struct('key','client_status','value','ACTIVE'),named_struct('key','name','value','Jane Doe')) DETAIL_DATA
)

SELECT max(case when e.key='name' then e.value end) as name,
max(case when e.key='client_status' then e.value end) as status
FROM sample_table
lateral view inline(DETAIL_DATA) e as key, value
group by USER_ID

Result:

    name    status  
------------------------
Jane Doe ACTIVE

If you can guarantee the order of structs in array (one with status comes first always), you can address nested elements dirctly

SELECT detail_data[0].value as client_status,
detail_data[1].value as name
from sample_table

One more approach, if you do not know the order in array, but array is of size=2, CASE expressions without explode will give better performance:

SELECT case when DETAIL_DATA[0].key='name' then DETAIL_DATA[0].value else  DETAIL_DATA[1].value end as name, 
case when DETAIL_DATA[0].key='client_status' then DETAIL_DATA[0].value else DETAIL_DATA[1].value end as status
FROM sample_table


Related Topics



Leave a reply



Submit