Is there a way to parse XML tags in BigQuery Standard SQL?
Here is the documentation to how to use Javascript UDFs in BigQuery like Elliot has mentioned.
https://cloud.google.com/bigquery/docs/reference/standard-sql/user-defined-functions
I imagine the UDF might look something like
CREATE TEMPORARY FUNCTION XML(x STRING)
RETURNS STRING
LANGUAGE js AS """
var data = fromXML(x);
return data.title;
"""
OPTIONS(
library="gs://<BUCKET_NAME>/from-xml.min.js"
);
SELECT XML(a) FROM UNNEST(["<title>Title of Page</title>"]) as a
Where from-xml.min.js is from this library and loaded into your gcs account
How to Parse simple data in BigQuery
Below example is for BigQuery Standard SQL
#standardSQL
WITH `project.dataset.table` AS (
SELECT 1 id, '1/2' list UNION ALL
SELECT 2, '1/3' UNION ALL
SELECT 3, '10/20' UNION ALL
SELECT 4, '15/' UNION ALL
SELECT 5, '12/31'
)
SELECT id,
SPLIT(list, '/')[SAFE_OFFSET(0)] AS first_element,
SPLIT(list, '/')[SAFE_OFFSET(1)] AS second_element
FROM `project.dataset.table`
-- ORDER BY id
with result as below
Row id first_element second_element
1 1 1 2
2 2 1 3
3 3 10 20
4 4 15
5 5 12 31
BigQuery get columns from JSON file keys
Below is for BigQuery Standard SQL
#standardSQL
SELECT
JSON_EXTRACT_SCALAR(line, '$.id') id,
TRIM(SPLIT(aud_kv, ':')[OFFSET(0)], '"') audiences,
TRIM(SPLIT(seg_kv, ':')[OFFSET(0)], '"') segments
FROM `project.dataset.table`,
UNNEST(SPLIT(TRIM(JSON_EXTRACT(line, '$.key1.key2.audiences'),'{}'))) aud_kv,
UNNEST(SPLIT(TRIM(JSON_EXTRACT(line, '$.key1.key2.segments'),'{}'))) seg_kv
if to apply to sample data from your question - output is
Row id audiences segments
1 abcdefg aud1 seg1
2 abcdefg aud1 seg2
3 abcdefg aud1 seg3
4 abcdefg aud1 seg4
5 abcdefg aud2 seg1
6 abcdefg aud2 seg2
7 abcdefg aud2 seg3
8 abcdefg aud2 seg4
SQL conditional aggregation?
Standard SQL offers listagg()
to aggregate strings. So this looks something like:
select name,
listagg(case when virtual = 1 then message end, ',') within group (order by message)
from t
group by name;
However, most databases have different names (and syntax) for string aggregation, such as string_agg()
or group_concat()
.
EDIT:
In BQ the syntax would be:
select name,
string_agg(case when virtual = 1 then message end, ',')
from t
group by name;
That said, I would recommend array_agg()
rather than string_agg()
.
how to read multiple levels of JSON data in Big Query using JSON_EXTRACT or JSON_EXTRACT_SCALAR
Below example BigQuery for Standard SQL
#standardSQL
CREATE TEMP FUNCTION jsonparse(input STRING)
RETURNS ARRAY<STRING>
LANGUAGE js AS """
return JSON.parse(input).map(x=>JSON.stringify(x));
""";
WITH `project.lz.json_file` AS (
SELECT '''{
"Combos": [ {
"Id": "1111",
"Type": 0,
"Description": "ABCD",
"ComboDuration": {
"StartDate": "2009-10-26T08:00:00",
"EndDate": "2009-10-29T08:00:00"
} }, {
"Id": "2222",
"Type": 1,
"Description": "XYZ",
"ComboDuration": {
"StartDate": "2019-10-26T08:00:00",
"EndDate": "2019-10-29T08:00:00"
} }, {
"Id": "39933",
"Type": 3,
"Description": "General",
"ComboDuration": {
"StartDate": "2019-10-26T08:00:00",
"EndDate": "2019-10-29T08:00:00"
} }, {
"Id": "39934",
"Type": 2,
"Description": "ABCDXYZ",
"ComboDuration": {
"StartDate": "2019-10-26T08:00:00",
"EndDate": "2019-10-29T08:00:00"
} }]} ''' AS conv_column
)
SELECT
JSON_EXTRACT_SCALAR(combo, '$.Id') AS Id,
JSON_EXTRACT_SCALAR(combo, '$.Type') AS Type,
JSON_EXTRACT_SCALAR(combo, '$.Description') AS Description,
JSON_EXTRACT_SCALAR(combo, '$.ComboDuration.StartDate') AS StartDate,
JSON_EXTRACT_SCALAR(combo, '$.ComboDuration.EndDate') AS EndDate
FROM `project.lz.json_file`,
UNNEST(jsonparse(JSON_EXTRACT(conv_column, '$.Combos'))) combo
with output
Row Id Type Description StartDate EndDate
1 1111 0 ABCD 2009-10-26T08:00:00 2009-10-29T08:00:00
2 2222 1 XYZ 2019-10-26T08:00:00 2019-10-29T08:00:00
3 39933 3 General 2019-10-26T08:00:00 2019-10-29T08:00:00
4 39934 2 ABCDXYZ 2019-10-26T08:00:00 2019-10-29T08:00:00
Convert HTML characters to unicode in BigQuery
The following general technique works:
- Split the text on each character where an HTML entity character like
😜
is considered a single character - Keep track of character position with
OFFSET
- Rejoin all characters, but use some BigQuery STRING function magic to replace HTML entities with their unicode character.
SELECT
id,
ANY_VALUE(text) AS original,
STRING_AGG(
COALESCE(
-- Support hex codepoints
CODE_POINTS_TO_STRING(
[CAST(CONCAT('0x', REGEXP_EXTRACT(char, r'(?:)(\w+)(?:;)')) AS INT64)]
),
-- Support decimal codepoints
CODE_POINTS_TO_STRING(
[CAST(CONCAT('0x', FORMAT('%x', CAST(REGEXP_EXTRACT(char, r'(?:)(\d+)(?:;)') AS INT64))) AS INT64)]
),
-- Fall back to the character itself
char
),
'' ORDER BY char_position) AS text
FROM UNNEST([
STRUCT(1 AS id, 'Hello World 😜' AS text),
STRUCT(2 AS id, 'Yes 😜 It works great 😜'),
STRUCT(3 AS id, '—' AS text),
STRUCT(4 AS id, '—' AS text)
])
CROSS JOIN
-- Extract all characters individually except for HTML entity characters
UNNEST(REGEXP_EXTRACT_ALL(text, r'(\w+;|.)')) char WITH OFFSET AS char_position
GROUP BY id
Best way to unnest and select column if table has repeated record column which itself contains many repeated record column
Below is for BigQuery Standard SQL
#standardSQL
SELECT
ANY_VALUE(sku),
SUM((SELECT SUM(cost) FROM f.unit)),
SUM((SELECT SUM(fee) FROM f.product))
FROM nonpii_air_ticketed.test,
UNNEST(fan) f
Related Topics
Query on Datetime Fields with Milliseconds Gives Wrong Result in SQL Server
SQL Server - Give a Login Permission for Read Access to All Existing and Future Databases
Sql: Join Tables on Substrings
Join Versus Exists Performance
Generating Xml File from SQL Server 2008
How to Format Datetime as M/D/Yyyy in SQL Server
SQL How to Search a Many to Many Relationship
Xquery - How to Use the SQL:Variable in 'Value()' Function
Postgres 9.4 JSONb Array as Table
Return Value from MySQL Stored Procedure
How to Cancel a SQL Server Execution Process Programmatically
Why Do SQL Id Sequences Go Out of Sync (Specifically Using Postgres)
Join Versus Exists Performance
Listagg Query "Ora-00937: Not a Single-Group Group Function"
Does SQL Server Optimize Dateadd Calculation in Select Query