Bigquery Group_Concat and Order By

BigQuery GROUP_CONCAT and ORDER BY

Since BigQuery doesn't support ORDER BY clause inside GROUP_CONCAT function, this functionality can be achieved by use of analytic window functions. And in BigQuery separator for GROUP_CONCAT is simply a second parameter for the function.
Below example illustrates this:

select key, first(grouped_value) concat_value from (
select
key,
group_concat(value, ':') over
(partition by key
order by value asc
rows between unbounded preceding and unbounded following)
grouped_value
from (
select key, value from
(select 1 as key, 'b' as value),
(select 1 as key, 'c' as value),
(select 1 as key, 'a' as value),
(select 2 as key, 'y' as value),
(select 2 as key, 'x' as value))) group by key

Will produce the following:

Row key concat_value     
1 1 a:b:c
2 2 x:y

NOTE on Window specification: The query uses "rows between unbounded preceding and unbounded following" window specification, to make sure that all rows within a partition participate in GROUP_CONCAT aggregation. Per SQL Standard default window specification is "rows between unbounded preceding and current row" which is good for things like running sum, but won't work correctly in this problem.

Performance note: Even though it looks wasteful to recompute aggregation function multiple times, the BigQuery optimizer does recognize that since window is not changing result will be the same, so it only computes aggregation once per partition.

Function not found: group_concat

Use STRING_AGG in standard SQL instead of the legacy GROUP_CONCAT. You can use standard SQL through the UI by unchecking "Use legacy SQL" under "Show Options" or else putting #standardSQL at the top of your query. See also Enabling Standard SQL.

How to use GROUP_CONCAT and/or NEST but limit number of elements in the result

This is not possible with legacy SQL aggregate functions, but it is possible with standard SQL and its aggregate functions STRING_AGG (corresponds to GROUP_CONCAT in legacy SQL) and ARRAY_AGG (corresponds to NEST). Both of these functions support optional LIMIT clause as documented in "Using LIMIT with aggregate functions".

For example:

select string_agg(x LIMIT 2) 
from unnest(['hello', 'world!', 'foo', 'bar', 'baz']) x

returns 'hello,world!' string, and

select array_agg(x LIMIT 2) 
from unnest(['hello', 'world!', 'foo', 'bar', 'baz']) x

returns ['hello', 'world!'] array.

Syntax to run a distinct GROUP_CONCAT in Google Bigquery

Use a subquery to group by and get distinct values. something like this:

SELECT campaign.id AS campaign_id,
GROUP_CONCAT(utm.campaign) AS utm_campaign
FROM
(Select campaign.id,utm.campaign
FROM [email_event]
WHERE (TIMESTAMP BETWEEN SEC_TO_TIMESTAMP(1412136000) AND SEC_TO_TIMESTAMP(1414814340))
GROUP EACH BY campaign.id,utm.campaign)
GROUP BY campaign_id;

another option for few aggregated fields is to do it in stages...

SELECT campaign_id ,
GROUP_CONCAT(utm_campaign) as utm_campaign,
utm_content
From
(SELECT campaign.id AS campaign_id,
utm.campaign as utm_campaign,
GROUP_CONCAT(utm.content) AS utm_content
FROM
(
SELECT *
FROM
( SELECT 507 AS campaign.id,
'remarketingemail' AS utm.campaign,
'newsletter_feb' AS utm.content),
( SELECT 507 AS campaign.id,
'remarketingemail2' AS utm.campaign,
'newsletter_feb' AS utm.content),
(SELECT 508 AS campaign.id,
'remarketingemail' AS utm.campaign,
'newsletter_jan' AS utm.content),
(SELECT 508 AS campaign.id,
'remarketingemail' AS utm.campaign,
'newsletter_feb' AS utm.content)
)
GROUP BY utm_campaign,campaign_id)
GROUP BY utm_content,campaign_id
;

Bigquery Standard Sql equivalent for GROUP_CONCAT_UNQUOTED from Legacy Sql

Below is for BigQuery Standard SQL

#standardSQL
SELECT '''
SELECT ID, AID, ''' || (
SELECT STRING_AGG(DISTINCT "MAX(IF(key = '" || key || "', value, NULL)) AS " || key)
FROM `project.dataset.table`
) || '''
FROM `project.dataset.table`
GROUP BY 1,2
ORDER BY 1,2
'''

Also yo can use EXECUTE IMMEDIATE to actually execute just built query in one shot

EXECUTE IMMEDIATE '''
SELECT ID, AID, ''' || (
SELECT STRING_AGG(DISTINCT "MAX(IF(key = '" || key || "', value, NULL)) AS " || key)
FROM `project.dataset.table`
) || '''
FROM `project.dataset.table`
GROUP BY 1,2
ORDER BY 1,2
'''

How to get distinct values on GROUP_CONCAT using Google Big Query

Here is solution which uses UNIQUE scope aggregation function to remove duplicates. Note, that in order to use it, first we need to build a REPEATED using NEST aggregation:

SELECT 
GROUP_CONCAT(UNIQUE(ids)) WITHIN RECORD,
GROUP_CONCAT(UNIQUE(products)) WITHIN RECORD
FROM (
SELECT
category,
NEST(id) as ids,
NEST(product) as products
FROM
(SELECT "a" as category, "1" as id, "car" as product),
(SELECT "a" as category, "2" as id, "car" as product),
(SELECT "a" as category, "3" as id, "car" as product),
(SELECT "b" as category, "4" as id, "car" as product),
(SELECT "b" as category, "5" as id, "car" as product),
(SELECT "b" as category, "2" as id, "bike" as product),
(SELECT "a" as category, "1" as id, "truck" as product),
GROUP BY
category
)


Related Topics



Leave a reply



Submit