How do you pivot data in bigquery standard SQL without manual hardcoding?
in reality the table has too many possible products to make it feasible to write all of the options out manually
Below is for BigQuery Standard SQL
You can do this in two steps - first prepare dynamically pivot query by running below
#standardSQL
SELECT CONCAT('SELECT user_id, ',
STRING_AGG(
CONCAT('COUNTIF(product_purchased = "', product_purchased, '") AS product_', product_purchased)
),
' FROM `project.dataset.your_table` GROUP BY user_id')
FROM (
SELECT product_purchased
FROM `project.dataset.your_table`
GROUP BY product_purchased
)
as a result you will get string representing the query that you need to run to get desired result
As an example, if to apply to dummy data from your question
#standardSQL
WITH `project.dataset.your_table` AS (
SELECT 111 user_id, 'A' product_purchased UNION ALL
SELECT 111, 'B' UNION ALL
SELECT 222, 'B' UNION ALL
SELECT 222, 'B' UNION ALL
SELECT 333, 'C' UNION ALL
SELECT 444, 'A'
)
SELECT CONCAT('SELECT user_id, ',
STRING_AGG(
CONCAT('COUNTIF(product_purchased = "', product_purchased, '") AS product_', product_purchased)
),
' FROM `project.dataset.your_table` GROUP BY user_id')
FROM (
SELECT product_purchased
FROM `project.dataset.your_table`
GROUP BY product_purchased
)
you will get below query (formatted for better view here)
SELECT
user_id,
COUNTIF(product_purchased = "A") AS product_A,
COUNTIF(product_purchased = "B") AS product_B,
COUNTIF(product_purchased = "C") AS product_C
FROM `project.dataset.your_table`
GROUP BY user_id
Now, you can just run this to get desired result without manual coding
Again, if to run it against dummy data from your question
#standardSQL
WITH `project.dataset.your_table` AS (
SELECT 111 user_id, 'A' product_purchased UNION ALL
SELECT 111, 'B' UNION ALL
SELECT 222, 'B' UNION ALL
SELECT 222, 'B' UNION ALL
SELECT 333, 'C' UNION ALL
SELECT 444, 'A'
)
SELECT
user_id,
COUNTIF(product_purchased = "A") AS product_A,
COUNTIF(product_purchased = "B") AS product_B,
COUNTIF(product_purchased = "C") AS product_C
FROM `project.dataset.your_table`
GROUP BY user_id
-- ORDER BY user_id
you get expected result
Row user_id product_A product_B product_C
1 111 1 1 0
2 222 0 2 0
3 333 0 0 1
4 444 1 0 0
Is there a way to do this pivoting in a more automated and elegant way?
You can easily automate above using any client
of your choice
How to Pivot table in BigQuery
Update 2020:
Just call fhoffa.x.pivot()
, as detailed in this post:
- https://medium.com/@hoffa/easy-pivot-in-bigquery-one-step-5a1f13c6c710
For the 2019 example, for example:
CREATE OR REPLACE VIEW `fh-bigquery.temp.a` AS (
SELECT * EXCEPT(SensorName), REGEXP_REPLACE(SensorName, r'.*/', '') SensorName
FROM `data-sensing-lab.io_sensor_data.moscone_io13`
);
CALL fhoffa.x.pivot(
'fh-bigquery.temp.a'
, 'fh-bigquery.temp.delete_pivotted' # destination table
, ['MoteName', 'TIMESTAMP_TRUNC(Timestamp, HOUR) AS hour'] # row_ids
, 'SensorName' # pivot_col_name
, 'Data' # pivot_col_value
, 8 # max_columns
, 'AVG' # aggregation
, 'LIMIT 10' # optional_limit
);
Update 2019:
Since this is a popular question, let me update to #standardSQL and a more general case of pivoting. In this case we have multiple rows, and each sensor looks at a different type of property. To pivot it, we would do something like:
#standardSQL
SELECT MoteName
, TIMESTAMP_TRUNC(Timestamp, hour) hour
, AVG(IF(SensorName LIKE '%altitude', Data, null)) altitude
, AVG(IF(SensorName LIKE '%light', Data, null)) light
, AVG(IF(SensorName LIKE '%mic', Data, null)) mic
, AVG(IF(SensorName LIKE '%temperature', Data, null)) temperature
FROM `data-sensing-lab.io_sensor_data.moscone_io13`
WHERE MoteName = 'XBee_40670F5F'
GROUP BY 1, 2
As an alternative to AVG()
you can try MAX()
, ANY_VALUE()
, etc.
Previously:
I'm not sure what you are trying to do, but:
SELECT NTH(1, words) WITHIN RECORD column_1, NTH(2, words) WITHIN RECORD column_2, f0_
FROM (
SELECT NEST(word) words, SUM(c)
FROM (
SELECT word, SUM(word_count) c
FROM publicdata:samples.shakespeare
WHERE word in ('brave', 'attended')
GROUP BY 1
)
)
UPDATE: Same results, simpler query:
SELECT NTH(1, word) column_1, NTH(2, word) column_2, SUM(c)
FROM (
SELECT word, SUM(word_count) c
FROM publicdata:samples.shakespeare
WHERE word in ('brave', 'attended')
GROUP BY 1
)
BigQuery Pivot Data Rows Columns
There is no nice way of doing this in BigQuery, but you can do it follow below idea
Step 1
Run below query
SELECT 'SELECT CUST_createdMonth, ' +
GROUP_CONCAT_UNQUOTED(
'EXACT_COUNT_DISTINCT(IF(Transaction_Month = "' + Transaction_Month + '", ConsumerId, NULL)) as [m_' + REPLACE(Transaction_Month, '/', '_') + ']'
)
+ ' FROM yourTable GROUP BY CUST_createdMonth ORDER BY CUST_createdMonth'
FROM (
SELECT Transaction_Month
FROM yourTable
GROUP BY Transaction_Month
ORDER BY Transaction_Month
)
As a result - you will get string like below (it is formatted below for readability sake)
SELECT
CUST_createdMonth,
EXACT_COUNT_DISTINCT(IF(Transaction_Month = "01/01/2015", ConsumerId, NULL)) AS [m_01_01_2015],
EXACT_COUNT_DISTINCT(IF(Transaction_Month = "01/02/2015", ConsumerId, NULL)) AS [m_01_02_2015],
EXACT_COUNT_DISTINCT(IF(Transaction_Month = "01/03/2015", ConsumerId, NULL)) AS [m_01_03_2015],
EXACT_COUNT_DISTINCT(IF(Transaction_Month = "01/04/2015", ConsumerId, NULL)) AS [m_01_04_2015],
EXACT_COUNT_DISTINCT(IF(Transaction_Month = "01/05/2015", ConsumerId, NULL)) AS [m_01_05_2015],
EXACT_COUNT_DISTINCT(IF(Transaction_Month = "01/06/2015", ConsumerId, NULL)) AS [m_01_06_2015]
FROM yourTable
GROUP BY
CUST_createdMonth
ORDER BY
CUST_createdMonth
Step 2
Just run above composed query
Result will be lik e below
CUST_createdMonth m_01_01_2015 m_01_02_2015 m_01_03_2015 m_01_04_2015 m_01_05_2015 m_01_06_2015
01/01/2015 2 1 0 0 0 0
01/02/2015 0 3 1 0 0 0
01/03/2015 0 0 2 1 0 1
01/04/2015 0 0 0 2 1 0
Note
Step 1 is helpful if you have many months to pivot so too much of manual work.
In this case - Step 1 helps you to generate your query
You can see more about pivoting in my other posts.
How to scale Pivoting in BigQuery?
Please note – there is a limitation of 10K columns per table - so you are limited with 10K organizations.
You can also see below as simplified examples (if above one is too complex/verbose):
How to transpose rows to columns with large amount of the data in BigQuery/SQL?
How to create dummy variable columns for thousands of categories in Google BigQuery?
Pivot Repeated fields in BigQuery
Big Query Transpose
There is no nice way of doing this in BigQuery as of yet, but you can do it following below idea
Step 1
Run below query
SELECT 'SELECT [group], ' +
GROUP_CONCAT_UNQUOTED(
'SUM(IF([date] = "' + [date] + '", value, NULL)) as [d_' + REPLACE([date], '/', '_') + ']'
)
+ ' FROM YourTable GROUP BY [group] ORDER BY [group]'
FROM (
SELECT [date] FROM YourTable GROUP BY [date] ORDER BY [date]
)
As a result - you will get string like below (it is formatted below for readability sake)
SELECT
[group],
SUM(IF([date] = "date1", value, NULL)) AS [d_date1],
SUM(IF([date] = "date2", value, NULL)) AS [d_date2]
FROM YourTable
GROUP BY [group]
ORDER BY [group]
Step 2
Just run above composed query
Result will be like below
group d_date1 d_date2
group1 15 30
Note 1: Step 1 is helpful if you have many groups to pivot so too much of manual work. In this case - Step 1 helps you to generate your query
Note 2: these steps are easily implemented in any client of your choice or you can just run those in BigQuery Web UI
You can see more about pivoting in my other posts.
How to scale Pivoting in BigQuery?
Please note – there is a limitation of 10K columns per table - so you are limited with 10K organizations.
You can also see below as simplified examples (if above one is too complex/verbose):
How to transpose rows to columns with large amount of the data in BigQuery/SQL?
How to create dummy variable columns for thousands of categories in Google BigQuery?
Pivot Repeated fields in BigQuery
How to unpivot in BigQuery?
2020 update: fhoffa.x.unpivot()
See:
- https://medium.com/@hoffa/how-to-unpivot-multiple-columns-into-tidy-pairs-with-sql-and-bigquery-d9d0e74ce675
I created a public persistent UDF. If you have a table a
, you can give the whole row to the UDF for it to be unpivotted:
SELECT geo_type, region, transportation_type, unpivotted
FROM `fh-bigquery.public_dump.applemobilitytrends_20200414` a
, UNNEST(fhoffa.x.unpivot(a, '_2020')) unpivotted
It transforms a table like this:
Into this
As a comment mentions, my solution above doesn't solve for the question problem.
So here's a variation, while I look how to integrate all into one:
CREATE TEMP FUNCTION unpivot(x ANY TYPE) AS (
(
SELECT
ARRAY_AGG(STRUCT(
REGEXP_EXTRACT(y, '[^"]+') AS key
, REGEXP_EXTRACT(y, ':([0-9]+)') AS value
))
FROM UNNEST((
SELECT REGEXP_EXTRACT_ALL(json,'"[smlx][meaxl]'||r'[^:]+:\"?[^"]+?') arr
FROM (SELECT TO_JSON_STRING(x) json))) y
)
);
SELECT location, unpivotted.*
FROM `robotic-charmer-726.bl_test_data.reconfiguring_a_table` x
, UNNEST(unpivot(x)) unpivotted
Previous answer:
Use the UNION of tables (with ',' in BigQuery), plus some column aliasing:
SELECT Location, Size, Quantity
FROM (
SELECT Location, 'Small' as Size, Small as Quantity FROM [table]
), (
SELECT Location, 'Medium' as Size, Medium as Quantity FROM [table]
), (
SELECT Location, 'Large' as Size, Large as Quantity FROM [table]
)
Related Topics
Why When Matched' Cannot Appear More Than Once in a 'Update' Clause of a Merge Statement
Difference Between a Inline Function and a View
SQL Query to Create a Calculated Field
Convert Datetime to Unix Epoch in Informix
Ssms: How to Import (Copy/Paste) Data from Excel
How to Calculate Between Different Group of Rows of the Same Table
How to Extend the Query to Add 0 in the Cell When No Activity Is Performed
Bulk Update Multiple Rows in Same Query Using Postgresql
How to Retrieve Rows Multiple Times in SQL Server
Support for JSON in Oracle 11G
How to Add Sequence Number for Each Element in a Group Using a SQL Query Without Temp Tables
Sum of Digits of a Number in SQL Server Without Using Traditional Loops Like While
Modify(Replace) Xml for Conditions
Import CSV File Error:Column Value Containing Column Delimiter