BigQuery GROUP_CONCAT and ORDER BY
Since BigQuery doesn't support ORDER BY clause inside GROUP_CONCAT function, this functionality can be achieved by use of analytic window functions. And in BigQuery separator for GROUP_CONCAT is simply a second parameter for the function.
Below example illustrates this:
select key, first(grouped_value) concat_value from (
select
key,
group_concat(value, ':') over
(partition by key
order by value asc
rows between unbounded preceding and unbounded following)
grouped_value
from (
select key, value from
(select 1 as key, 'b' as value),
(select 1 as key, 'c' as value),
(select 1 as key, 'a' as value),
(select 2 as key, 'y' as value),
(select 2 as key, 'x' as value))) group by key
Will produce the following:
Row key concat_value
1 1 a:b:c
2 2 x:y
NOTE on Window specification: The query uses "rows between unbounded preceding and unbounded following" window specification, to make sure that all rows within a partition participate in GROUP_CONCAT aggregation. Per SQL Standard default window specification is "rows between unbounded preceding and current row" which is good for things like running sum, but won't work correctly in this problem.
Performance note: Even though it looks wasteful to recompute aggregation function multiple times, the BigQuery optimizer does recognize that since window is not changing result will be the same, so it only computes aggregation once per partition.
Function not found: group_concat
Use STRING_AGG
in standard SQL instead of the legacy GROUP_CONCAT
. You can use standard SQL through the UI by unchecking "Use legacy SQL" under "Show Options" or else putting #standardSQL
at the top of your query. See also Enabling Standard SQL.
How to use GROUP_CONCAT and/or NEST but limit number of elements in the result
This is not possible with legacy SQL aggregate functions, but it is possible with standard SQL and its aggregate functions STRING_AGG
(corresponds to GROUP_CONCAT
in legacy SQL) and ARRAY_AGG
(corresponds to NEST
). Both of these functions support optional LIMIT
clause as documented in "Using LIMIT with aggregate functions".
For example:
select string_agg(x LIMIT 2)
from unnest(['hello', 'world!', 'foo', 'bar', 'baz']) x
returns 'hello,world!'
string, and
select array_agg(x LIMIT 2)
from unnest(['hello', 'world!', 'foo', 'bar', 'baz']) x
returns ['hello', 'world!']
array.
Syntax to run a distinct GROUP_CONCAT in Google Bigquery
Use a subquery to group by and get distinct values. something like this:
SELECT campaign.id AS campaign_id,
GROUP_CONCAT(utm.campaign) AS utm_campaign
FROM
(Select campaign.id,utm.campaign
FROM [email_event]
WHERE (TIMESTAMP BETWEEN SEC_TO_TIMESTAMP(1412136000) AND SEC_TO_TIMESTAMP(1414814340))
GROUP EACH BY campaign.id,utm.campaign)
GROUP BY campaign_id;
another option for few aggregated fields is to do it in stages...
SELECT campaign_id ,
GROUP_CONCAT(utm_campaign) as utm_campaign,
utm_content
From
(SELECT campaign.id AS campaign_id,
utm.campaign as utm_campaign,
GROUP_CONCAT(utm.content) AS utm_content
FROM
(
SELECT *
FROM
( SELECT 507 AS campaign.id,
'remarketingemail' AS utm.campaign,
'newsletter_feb' AS utm.content),
( SELECT 507 AS campaign.id,
'remarketingemail2' AS utm.campaign,
'newsletter_feb' AS utm.content),
(SELECT 508 AS campaign.id,
'remarketingemail' AS utm.campaign,
'newsletter_jan' AS utm.content),
(SELECT 508 AS campaign.id,
'remarketingemail' AS utm.campaign,
'newsletter_feb' AS utm.content)
)
GROUP BY utm_campaign,campaign_id)
GROUP BY utm_content,campaign_id
;
Bigquery Standard Sql equivalent for GROUP_CONCAT_UNQUOTED from Legacy Sql
Below is for BigQuery Standard SQL
#standardSQL
SELECT '''
SELECT ID, AID, ''' || (
SELECT STRING_AGG(DISTINCT "MAX(IF(key = '" || key || "', value, NULL)) AS " || key)
FROM `project.dataset.table`
) || '''
FROM `project.dataset.table`
GROUP BY 1,2
ORDER BY 1,2
'''
Also yo can use EXECUTE IMMEDIATE to actually execute just built query in one shot
EXECUTE IMMEDIATE '''
SELECT ID, AID, ''' || (
SELECT STRING_AGG(DISTINCT "MAX(IF(key = '" || key || "', value, NULL)) AS " || key)
FROM `project.dataset.table`
) || '''
FROM `project.dataset.table`
GROUP BY 1,2
ORDER BY 1,2
'''
How to get distinct values on GROUP_CONCAT using Google Big Query
Here is solution which uses UNIQUE
scope aggregation function to remove duplicates. Note, that in order to use it, first we need to build a REPEATED
using NEST
aggregation:
SELECT
GROUP_CONCAT(UNIQUE(ids)) WITHIN RECORD,
GROUP_CONCAT(UNIQUE(products)) WITHIN RECORD
FROM (
SELECT
category,
NEST(id) as ids,
NEST(product) as products
FROM
(SELECT "a" as category, "1" as id, "car" as product),
(SELECT "a" as category, "2" as id, "car" as product),
(SELECT "a" as category, "3" as id, "car" as product),
(SELECT "b" as category, "4" as id, "car" as product),
(SELECT "b" as category, "5" as id, "car" as product),
(SELECT "b" as category, "2" as id, "bike" as product),
(SELECT "a" as category, "1" as id, "truck" as product),
GROUP BY
category
)
Related Topics
How to Group and Choose Lowest Value in SQL
How to Perform a Select Query in a Do Block
How to Get Max(Date) from Given Set of Data Grouped by Some Fields Using Pyspark
Split Date Range into One Row Per Month in SQL Server
How to Get Running Sum of a Column in SQL Server
How to Create Birt Report Based on Multiple Data Sets
Database Normalization - Who's Right
SQL - Select Rows from Two Different Tables
Oracle Client and Networking Components Were Not Found
SQL Query of Haversine Formula in SQL Server
Access Columns of a Table by Index Instead of Name in SQL Server Stored Procedure
Foreign Key Column Mapped to Multiple Primary Keys
Using Except Clause in Postgresql
Converting Any Number in Words
Best Way to Work with Transactions in Ms SQL Server Management Studio
How to Change a Pg Column to Nullable True
How to Create a "Unique" Constraint on a Boolean MySQL Column