How to Pivot Data in Bigquery Standard SQL Without Manual Hardcoding

How do you pivot data in bigquery standard SQL without manual hardcoding?

in reality the table has too many possible products to make it feasible to write all of the options out manually

Below is for BigQuery Standard SQL

You can do this in two steps - first prepare dynamically pivot query by running below

#standardSQL
SELECT CONCAT('SELECT user_id, ',
STRING_AGG(
CONCAT('COUNTIF(product_purchased = "', product_purchased, '") AS product_', product_purchased)
),
' FROM `project.dataset.your_table` GROUP BY user_id')
FROM (
SELECT product_purchased
FROM `project.dataset.your_table`
GROUP BY product_purchased
)

as a result you will get string representing the query that you need to run to get desired result

As an example, if to apply to dummy data from your question

#standardSQL
WITH `project.dataset.your_table` AS (
SELECT 111 user_id, 'A' product_purchased UNION ALL
SELECT 111, 'B' UNION ALL
SELECT 222, 'B' UNION ALL
SELECT 222, 'B' UNION ALL
SELECT 333, 'C' UNION ALL
SELECT 444, 'A'
)
SELECT CONCAT('SELECT user_id, ',
STRING_AGG(
CONCAT('COUNTIF(product_purchased = "', product_purchased, '") AS product_', product_purchased)
),
' FROM `project.dataset.your_table` GROUP BY user_id')
FROM (
SELECT product_purchased
FROM `project.dataset.your_table`
GROUP BY product_purchased
)

you will get below query (formatted for better view here)

SELECT
user_id,
COUNTIF(product_purchased = "A") AS product_A,
COUNTIF(product_purchased = "B") AS product_B,
COUNTIF(product_purchased = "C") AS product_C
FROM `project.dataset.your_table`
GROUP BY user_id

Now, you can just run this to get desired result without manual coding

Again, if to run it against dummy data from your question

#standardSQL
WITH `project.dataset.your_table` AS (
SELECT 111 user_id, 'A' product_purchased UNION ALL
SELECT 111, 'B' UNION ALL
SELECT 222, 'B' UNION ALL
SELECT 222, 'B' UNION ALL
SELECT 333, 'C' UNION ALL
SELECT 444, 'A'
)
SELECT
user_id,
COUNTIF(product_purchased = "A") AS product_A,
COUNTIF(product_purchased = "B") AS product_B,
COUNTIF(product_purchased = "C") AS product_C
FROM `project.dataset.your_table`
GROUP BY user_id
-- ORDER BY user_id

you get expected result

Row user_id product_A   product_B   product_C    
1 111 1 1 0
2 222 0 2 0
3 333 0 0 1
4 444 1 0 0

Is there a way to do this pivoting in a more automated and elegant way?

You can easily automate above using any client of your choice

How to Pivot table in BigQuery

Update 2020:

Just call fhoffa.x.pivot(), as detailed in this post:

  • https://medium.com/@hoffa/easy-pivot-in-bigquery-one-step-5a1f13c6c710

For the 2019 example, for example:

CREATE OR REPLACE VIEW `fh-bigquery.temp.a` AS (
SELECT * EXCEPT(SensorName), REGEXP_REPLACE(SensorName, r'.*/', '') SensorName
FROM `data-sensing-lab.io_sensor_data.moscone_io13`
);

CALL fhoffa.x.pivot(
'fh-bigquery.temp.a'
, 'fh-bigquery.temp.delete_pivotted' # destination table
, ['MoteName', 'TIMESTAMP_TRUNC(Timestamp, HOUR) AS hour'] # row_ids
, 'SensorName' # pivot_col_name
, 'Data' # pivot_col_value
, 8 # max_columns
, 'AVG' # aggregation
, 'LIMIT 10' # optional_limit
);

Update 2019:

Since this is a popular question, let me update to #standardSQL and a more general case of pivoting. In this case we have multiple rows, and each sensor looks at a different type of property. To pivot it, we would do something like:

#standardSQL
SELECT MoteName
, TIMESTAMP_TRUNC(Timestamp, hour) hour
, AVG(IF(SensorName LIKE '%altitude', Data, null)) altitude
, AVG(IF(SensorName LIKE '%light', Data, null)) light
, AVG(IF(SensorName LIKE '%mic', Data, null)) mic
, AVG(IF(SensorName LIKE '%temperature', Data, null)) temperature
FROM `data-sensing-lab.io_sensor_data.moscone_io13`
WHERE MoteName = 'XBee_40670F5F'
GROUP BY 1, 2

Sample Image

As an alternative to AVG() you can try MAX(), ANY_VALUE(), etc.


Previously:

I'm not sure what you are trying to do, but:

SELECT NTH(1, words) WITHIN RECORD column_1, NTH(2, words) WITHIN RECORD column_2, f0_
FROM (
SELECT NEST(word) words, SUM(c)
FROM (
SELECT word, SUM(word_count) c
FROM publicdata:samples.shakespeare
WHERE word in ('brave', 'attended')
GROUP BY 1
)
)

Sample Image

UPDATE: Same results, simpler query:

SELECT NTH(1, word) column_1, NTH(2, word) column_2, SUM(c)
FROM (
SELECT word, SUM(word_count) c
FROM publicdata:samples.shakespeare
WHERE word in ('brave', 'attended')
GROUP BY 1
)

BigQuery Pivot Data Rows Columns

There is no nice way of doing this in BigQuery, but you can do it follow below idea

Step 1

Run below query

SELECT 'SELECT CUST_createdMonth, ' + 
GROUP_CONCAT_UNQUOTED(
'EXACT_COUNT_DISTINCT(IF(Transaction_Month = "' + Transaction_Month + '", ConsumerId, NULL)) as [m_' + REPLACE(Transaction_Month, '/', '_') + ']'
)
+ ' FROM yourTable GROUP BY CUST_createdMonth ORDER BY CUST_createdMonth'
FROM (
SELECT Transaction_Month
FROM yourTable
GROUP BY Transaction_Month
ORDER BY Transaction_Month
)

As a result - you will get string like below (it is formatted below for readability sake)

SELECT
CUST_createdMonth,
EXACT_COUNT_DISTINCT(IF(Transaction_Month = "01/01/2015", ConsumerId, NULL)) AS [m_01_01_2015],
EXACT_COUNT_DISTINCT(IF(Transaction_Month = "01/02/2015", ConsumerId, NULL)) AS [m_01_02_2015],
EXACT_COUNT_DISTINCT(IF(Transaction_Month = "01/03/2015", ConsumerId, NULL)) AS [m_01_03_2015],
EXACT_COUNT_DISTINCT(IF(Transaction_Month = "01/04/2015", ConsumerId, NULL)) AS [m_01_04_2015],
EXACT_COUNT_DISTINCT(IF(Transaction_Month = "01/05/2015", ConsumerId, NULL)) AS [m_01_05_2015],
EXACT_COUNT_DISTINCT(IF(Transaction_Month = "01/06/2015", ConsumerId, NULL)) AS [m_01_06_2015]
FROM yourTable
GROUP BY
CUST_createdMonth
ORDER BY
CUST_createdMonth

Step 2

Just run above composed query

Result will be lik e below

CUST_createdMonth   m_01_01_2015    m_01_02_2015    m_01_03_2015    m_01_04_2015    m_01_05_2015    m_01_06_2015     
01/01/2015 2 1 0 0 0 0
01/02/2015 0 3 1 0 0 0
01/03/2015 0 0 2 1 0 1
01/04/2015 0 0 0 2 1 0

Note

Step 1 is helpful if you have many months to pivot so too much of manual work.

In this case - Step 1 helps you to generate your query

You can see more about pivoting in my other posts.

How to scale Pivoting in BigQuery?

Please note – there is a limitation of 10K columns per table - so you are limited with 10K organizations.

You can also see below as simplified examples (if above one is too complex/verbose):

How to transpose rows to columns with large amount of the data in BigQuery/SQL?

How to create dummy variable columns for thousands of categories in Google BigQuery?

Pivot Repeated fields in BigQuery

Big Query Transpose

There is no nice way of doing this in BigQuery as of yet, but you can do it following below idea

Step 1

Run below query

SELECT 'SELECT [group], ' + 
GROUP_CONCAT_UNQUOTED(
'SUM(IF([date] = "' + [date] + '", value, NULL)) as [d_' + REPLACE([date], '/', '_') + ']'
)
+ ' FROM YourTable GROUP BY [group] ORDER BY [group]'
FROM (
SELECT [date] FROM YourTable GROUP BY [date] ORDER BY [date]
)

As a result - you will get string like below (it is formatted below for readability sake)

SELECT 
[group],
SUM(IF([date] = "date1", value, NULL)) AS [d_date1],
SUM(IF([date] = "date2", value, NULL)) AS [d_date2]
FROM YourTable
GROUP BY [group]
ORDER BY [group]

Step 2

Just run above composed query

Result will be like below

group   d_date1 d_date2  
group1 15 30

Note 1: Step 1 is helpful if you have many groups to pivot so too much of manual work. In this case - Step 1 helps you to generate your query

Note 2: these steps are easily implemented in any client of your choice or you can just run those in BigQuery Web UI

You can see more about pivoting in my other posts.

How to scale Pivoting in BigQuery?

Please note – there is a limitation of 10K columns per table - so you are limited with 10K organizations.

You can also see below as simplified examples (if above one is too complex/verbose):

How to transpose rows to columns with large amount of the data in BigQuery/SQL?

How to create dummy variable columns for thousands of categories in Google BigQuery?

Pivot Repeated fields in BigQuery

How to unpivot in BigQuery?

2020 update: fhoffa.x.unpivot()

See:

  • https://medium.com/@hoffa/how-to-unpivot-multiple-columns-into-tidy-pairs-with-sql-and-bigquery-d9d0e74ce675

I created a public persistent UDF. If you have a table a, you can give the whole row to the UDF for it to be unpivotted:

SELECT geo_type, region, transportation_type, unpivotted
FROM `fh-bigquery.public_dump.applemobilitytrends_20200414` a
, UNNEST(fhoffa.x.unpivot(a, '_2020')) unpivotted

It transforms a table like this:

Sample Image

Into this

Sample Image


As a comment mentions, my solution above doesn't solve for the question problem.

So here's a variation, while I look how to integrate all into one:

CREATE TEMP FUNCTION unpivot(x ANY TYPE) AS (
(
SELECT
ARRAY_AGG(STRUCT(
REGEXP_EXTRACT(y, '[^"]+') AS key
, REGEXP_EXTRACT(y, ':([0-9]+)') AS value
))
FROM UNNEST((
SELECT REGEXP_EXTRACT_ALL(json,'"[smlx][meaxl]'||r'[^:]+:\"?[^"]+?') arr
FROM (SELECT TO_JSON_STRING(x) json))) y
)
);

SELECT location, unpivotted.*
FROM `robotic-charmer-726.bl_test_data.reconfiguring_a_table` x
, UNNEST(unpivot(x)) unpivotted


Previous answer:

Use the UNION of tables (with ',' in BigQuery), plus some column aliasing:

SELECT Location, Size, Quantity
FROM (
SELECT Location, 'Small' as Size, Small as Quantity FROM [table]
), (
SELECT Location, 'Medium' as Size, Medium as Quantity FROM [table]
), (
SELECT Location, 'Large' as Size, Large as Quantity FROM [table]
)


Related Topics



Leave a reply



Submit