Bigquery If Field Exists Then

Bigquery If field exists

Let's assume your table has x and y fields only!

So below query will perfectly work

SELECT x, y FROM YourTable

But below one will fail because of non-existing field z

SELECT x, y, z FROM YourTable

The way to address this is as below

#legacySQL
SELECT x, y, COALESCE(z, 0) as z
FROM
(SELECT * FROM YourTable),
(SELECT true AS fake, NULL as z)
WHERE fake IS NULL

EDIT: added explicit #legacySQL to not to confuse those who is trying to apply this exact approach to Standard SQL :o)

IF Field Exists in StandardSQL

Below is for BigQuery Standard SQL

#standardSQL
SELECT * FROM `project.dataset.fruits`
WHERE EXISTS (
SELECT 1 FROM `project.dataset.fruits` t
WHERE REGEXP_CONTAINS(TO_JSON_STRING(t), '[{,]"peaches":')
LIMIT 1
)

BigQuery IF field exists THEN

Below should give you direction

SELECT * FROM
(SELECT * FROM <somewhere w/o my_field>),
(SELECT * FROM <somewhere with my_field>)

Assuming you have a, b and c as a fields in your original table () - above can be used (see below) if you need to change missing values from NULL to 0:

SELECT a, b, c, COALESCE(my_field, 0) as my_field
FROM
(SELECT * FROM <somewhere w/o my_field>),
(SELECT * FROM <somewhere with my_field>)

Select column value if column exists in that table else create that column and set it's value to null in BigQuery

I assume in the following that you have a source table (the one with potentially "missing" columns) and an existing target table (with the desired schema).

In order to get the information of the columns of these tables, you just need to look into the INFORMATION_SCHEMA.COLUMNS table.
The solution below uses dynamic SQL, to 1) generate the desired SQL, 2) run it.

DECLARE column_selection STRING;

SET column_selection = (
WITH column_table AS (
SELECT
source.column_name AS source_colum,
tgt.column_name AS target_column
FROM
(SELECT
column_name
FROM `<yourproject>.<target_dataset>.INFORMATION_SCHEMA.COLUMNS`
WHERE table_name='<target_table>') tgt
LEFT JOIN
(SELECT column_name
FROM `<yourproject>.<source_dataset>.INFORMATION_SCHEMA.COLUMNS`
WHERE table_name='<source_table>') source
ON source.column_name = tgt.column_name
)

SELECT STRING_AGG(coalesce(source_column,
CONCAT("NULL AS `",target_column, "`")), ", \n") AS col_selection
FROM
column_table
)

EXECUTE IMMEDIATE
FORMAT("SELECT %s FROM `<yourproject>.<source_dataset>.<source_table>`", column_selection) ;

Explanation of the steps

  1. Build a column_table for the columns we want to query:

    a. first column containing the columns of the target table,
    b. second one containing the corresponding source columns if they exist, or NULL if they don't

  2. Once we have this table, we can build the desired SELECT statement: the name of the column is it's in the source table, or if it's NOT present, we want to have in our query " NULL AS `column_name_in_target` "

This is expressed in the
coalesce(source_column, CONCAT("NULL AS ``",target_column, "\``"))

We aggregate all these statement with STRING_AGG into the desired column selection.


  1. Final step: putting together the rest of the query ( "SELECT" + <column_selection_string> + "FROM <your_source_table>" + ...), and we can EXECUTE IMMEDIATE it.

Filtering with exists in BigQuery

Use below instead

SELECT * FROM UNNEST([
STRUCT(NULL AS a, '' AS b),
(1, 'Alpha'),
(2, 'Bravo'),
(3, 'Charlie'),
(4, 'Delta')
])
WHERE (a,b) in UNNEST([
STRUCT(NULL AS a, '' AS b),
(1, 'Alpha')
])

with output

Sample Image

How to check if a value exists in an array type column using SQL?

Consider below approach

select format('%T', some_numbers) some_numbers,
(select count(1) > 0
from t.some_numbers number
where number in (3, 10)
) as exist
from sequences t

when applied to sample data in your question - output is

Sample Image

Note: I used format('%T', some_numbers) just for the sake of formatting output of array - but you might use just some_numbers instead



Related Topics



Leave a reply



Submit