Resources Exceeded During Query Execution

Understanding Resources exceeded during query execution with GROUP EACH BY in BigQuery

As you've guessed, BigQuery chooses a number of parallel workers (shards) for GROUP EACH and JOIN EACH queries based on the size of the tables being operated upon. It is a rough heuristic, but in practice, it works pretty well.

What is interesting about your query is that the GROUP EACH is being done over a larger table than the original table because of the expansion in the CROSS JOIN. Because of this, we choose a number of shards that is too small for your query.

To answer your specific questions:

  • Resources exceeded almost always means that a worker ran out of memory. This could be a shard or a mixer, in Dremel terms (mixers are the nodes in the computation tree that aggregate results. GROUP EACH BY pushes aggregation down to the shards, which are the leaves of the computation tree).

  • There isn't a good way to approximate the amount of resources available. This changes over time, with the goal that more of your queries should just work.

  • The number of shards is determined by the total bytes processed in the query. As you've noticed, this heuristic doesn't work well with joins that expand the underlying data sets. That said, there is active work underway to be smarter about how we pick the number of shards. To give you an idea of scale, your query got scheduled on only 20 shards, which is a tiny fraction of what a larger table would get.

As a workaround, you could save the intermediate result of the CROSS JOIN as a table, and running the GROUP EACH BY over that temporary table. That should let BigQuery use the expanded size when picking the number of shards. (if that doesn't work, please let me know, it is possible that we need to tweak our assignment thresholds).

BigQuery error: Resources exceeded during query execution on one dataset, but the same query runs without issues on another dataset

Without knowing your data (and even knowing it) is hard to know where exactly the reason for this error is.

Althought there is no limit of queries for a dataset (as long as your billing account allows it), this error appears because the query needs more slots (google's query processing units) than the available for your project at the time of processing the query.

As the error suggest the query is too complicated for one run with your available resources. Here you have to take into account that it is not just this query but also all the queries in every view, self joins and temporary tables/views running in parallel (in the background). Probably the same query works for other tables because with those tables there are not so many background processes and maybe the background views/tables have grown with the time.

As by google suggested, it is better to replace complex queries with multiple simple queries and materialized-views and/or tables (not views) in between with the partial results for each steps.

BigQuery - Resources exceeded during query execution: Not enough resources for query planning - too many subqueries

Use below as a starting point

select * from (
select *, replace(regexp_extract(data, r'\brank:(\d+)$'), ':', '') rank
from your_table
)
pivot (any_value(data) as rank for rank in ('1','2','3','4','5','6','7','8','9','10','11','12','13','14','15','16','17','18','19','20','21','22','23','24'))

if applied to sample data in your question - output is

Sample Image

Above is easy to transform to dynamic pivot - see multiple examples here on SO - in particular at least few of my answers dedicated to this



Related Topics



Leave a reply



Submit