Bigquery Date-Partitioned Views

BigQuery Date-Partitioned Views

Define your view to expose the partitioning pseudocolumn, like this:

SELECT *, EXTRACT(DATE FROM _PARTITIONTIME) AS date
FROM Date partitioned table;

Now if you query the view using a filter on date, it will restrict the partitions that are read.

Do views of tables in BigQuery benefit from partitioning/clustering optimization?

If you're talking about a logical view, then yes if the base table it references is clustered/partitioned it will use those features if they're referenced from the WHERE clause. The logical view doesn't have its own managed storage, it's just effectively a SQL subquery that gets run whenever the view is referenced.

If you're talking about a materialized view, then partitioning/clustering from the base table isn't inherited, but can be defined on the materialized view. See the DDL syntax for more details: https://cloud.google.com/bigquery/docs/reference/standard-sql/data-definition-language#create_materialized_view_statement

Query a view which is created from partitioned table in bigquery

If you want the partition time in the view, you need to include it explicitly:

SELECT c.*, _PARTITIONTIME as pt
FROM `customers`
WHERE DATE(_PARTITIONTIME) > '2021-05-10'

Can I efficiently GROUP BY over a date partitioned table in BigQuery

Found the answer, this does the job:

SELECT table_name, partition_id, total_rows
FROM `p.d.INFORMATION_SCHEMA.PARTITIONS`
WHERE partition_id IS NOT NULL
and table_name = 't'
order by partition_id desc

it returns quickly and, of course, queries much less data.

Query complete (1.7 sec elapsed, 10 MB processed)

UPDATE statement in BigQuery that sets _PARTITIONDATE equal particular date field in your table

Creating a partitioned table

Since you don't need a table partition by ingestion time, you can create your table using your own date field as the partition field. You can do so by adding the "PARTITON BY" statement when creating a table, like this

CREATE TABLE `project_id.mydataset.mytable` (
field1 STRING,
dt TIMESTAMP
)
PARTITION BY DATE(dt)

or

CREATE TABLE `project_id.mydataset.mytable`
PARTITION BY DATE(dt)
AS (
SELECT * FROM `project_id.mydataset.othertable`
)

Updating the _PARTITIONTIME

Addressing your original question, if you need you can also update the _PARTITIONTIME field. To set all _PARTITIONTIME fields equal to your dt column, you can do the following:

UPDATE
project_id.dataset.mytable
SET
_PARTITIONTIME = dt
WHERE
1=1

If dt has a different granularity than _PARTITIONTIME (_PARTITIONTIME granularity is day and dt is hour, for example), than you can do a TIMESTAMP_TRUNC

UPDATE
project_id.dataset.mytable
SET
_PARTITIONTIME = TIMESTAMP_TRUNCT(dt, DAY)
WHERE
1=1


Related Topics



Leave a reply



Submit