Why Partitions Elimination Does Not Happen for This Query

Why partitions elimination does not happen for this query?

Solution

select      count (column_name) 

from table_name

where year >= year (date_sub (current_date,7))
and month >= month (date_sub (current_date,7))
and day >= day (date_sub (current_date,7))
;

What went wrong with the original query?

unix_timestamp()

Gets current Unix timestamp in seconds. This function is not
deterministic and its value is not fixed for the scope of a query
execution, therefore prevents proper optimization of queries - this
has been deprecated since 2.0 in favour of CURRENT_TIMESTAMP constant.

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF

(I've just changed the documentation a little bit :-))

Since unix_timestamp() values might change during the execution, the expression should be evaluated for each row, therefore preventing partitions elimination.

Why using SET did not work?

set is nothing but a text replacement mechanism.

Nothing is being computed during the set.

The only thing that happens is that variables are being assigned a text.

Before the query is being executed the variables place holders (${hiveconf:...}) are being replaced with the assigned text.

Only then the query is being parsed and executed.

hive> set a=sele;
hive> set b=ct 1+;
hive> set c=1;
hive> ${hiveconf:a}${hiveconf:b}${hiveconf:c};
OK
2

Demo

create table table_name (column_name int) partitioned by (year int,month int,day int);

set hive.exec.dynamic.partition.mode=nonstrict;

insert into table_name partition (year,month,day)

select pos
,year(dt)
,month(dt)
,day(dt)

from (select pe.pos
,date_sub (current_date,pe.pos) as dt

from (select 1) x
lateral view posexplode (split (space (99),' ')) pe
) t
;

explain dependency

select count (column_name)

from table_name

where year >= year (date_sub (from_unixtime (unix_timestamp ()),7))
and month >= month (date_sub (from_unixtime (unix_timestamp ()),7))
and day >= day (date_sub (from_unixtime (unix_timestamp ()),7))
;

{"input_partitions":[{"partitionName":"default@table_name@year=2016/month=11/day=14"},{"partitionName":"default@table_name@year=2016/month=11/day=15"},{"partitionName":"default@table_name@year=2016/month=11/day=16"},{"partitionName":"default@table_name@year=2016/month=11/day=17"},{"partitionName":"default@table_name@year=2016/month=11/day=18"},{"partitionName":"default@table_name@year=2016/month=11/day=19"},{"partitionName":"default@table_name@year=2016/month=11/day=20"},{"partitionName":"default@table_name@year=2016/month=11/day=21"},{"partitionName":"default@table_name@year=2016/month=11/day=22"},{"partitionName":"default@table_name@year=2016/month=11/day=23"},{"partitionName":"default@table_name@year=2016/month=11/day=24"},{"partitionName":"default@table_name@year=2016/month=11/day=25"},{"partitionName":"default@table_name@year=2016/month=11/day=26"},{"partitionName":"default@table_name@year=2016/month=11/day=27"},{"partitionName":"default@table_name@year=2016/month=11/day=28"},{"partitionName":"default@table_name@year=2016/month=11/day=29"},{"partitionName":"default@table_name@year=2016/month=11/day=30"},{"partitionName":"default@table_name@year=2016/month=12/day=1"},{"partitionName":"default@table_name@year=2016/month=12/day=10"},{"partitionName":"default@table_name@year=2016/month=12/day=11"},{"partitionName":"default@table_name@year=2016/month=12/day=12"},{"partitionName":"default@table_name@year=2016/month=12/day=13"},{"partitionName":"default@table_name@year=2016/month=12/day=14"},{"partitionName":"default@table_name@year=2016/month=12/day=15"},{"partitionName":"default@table_name@year=2016/month=12/day=16"},{"partitionName":"default@table_name@year=2016/month=12/day=17"},{"partitionName":"default@table_name@year=2016/month=12/day=18"},{"partitionName":"default@table_name@year=2016/month=12/day=19"},{"partitionName":"default@table_name@year=2016/month=12/day=2"},{"partitionName":"default@table_name@year=2016/month=12/day=20"},{"partitionName":"default@table_name@year=2016/month=12/day=21"},{"partitionName":"default@table_name@year=2016/month=12/day=22"},{"partitionName":"default@table_name@year=2016/month=12/day=23"},{"partitionName":"default@table_name@year=2016/month=12/day=24"},{"partitionName":"default@table_name@year=2016/month=12/day=25"},{"partitionName":"default@table_name@year=2016/month=12/day=26"},{"partitionName":"default@table_name@year=2016/month=12/day=27"},{"partitionName":"default@table_name@year=2016/month=12/day=28"},{"partitionName":"default@table_name@year=2016/month=12/day=29"},{"partitionName":"default@table_name@year=2016/month=12/day=3"},{"partitionName":"default@table_name@year=2016/month=12/day=30"},{"partitionName":"default@table_name@year=2016/month=12/day=31"},{"partitionName":"default@table_name@year=2016/month=12/day=4"},{"partitionName":"default@table_name@year=2016/month=12/day=5"},{"partitionName":"default@table_name@year=2016/month=12/day=6"},{"partitionName":"default@table_name@year=2016/month=12/day=7"},{"partitionName":"default@table_name@year=2016/month=12/day=8"},{"partitionName":"default@table_name@year=2016/month=12/day=9"},{"partitionName":"default@table_name@year=2017/month=1/day=1"},{"partitionName":"default@table_name@year=2017/month=1/day=10"},{"partitionName":"default@table_name@year=2017/month=1/day=11"},{"partitionName":"default@table_name@year=2017/month=1/day=12"},{"partitionName":"default@table_name@year=2017/month=1/day=13"},{"partitionName":"default@table_name@year=2017/month=1/day=14"},{"partitionName":"default@table_name@year=2017/month=1/day=15"},{"partitionName":"default@table_name@year=2017/month=1/day=16"},{"partitionName":"default@table_name@year=2017/month=1/day=17"},{"partitionName":"default@table_name@year=2017/month=1/day=18"},{"partitionName":"default@table_name@year=2017/month=1/day=19"},{"partitionName":"default@table_name@year=2017/month=1/day=2"},{"partitionName":"default@table_name@year=2017/month=1/day=20"},{"partitionName":"default@table_name@year=2017/month=1/day=21"},{"partitionName":"default@table_name@year=2017/month=1/day=22"},{"partitionName":"default@table_name@year=2017/month=1/day=23"},{"partitionName":"default@table_name@year=2017/month=1/day=24"},{"partitionName":"default@table_name@year=2017/month=1/day=25"},{"partitionName":"default@table_name@year=2017/month=1/day=26"},{"partitionName":"default@table_name@year=2017/month=1/day=27"},{"partitionName":"default@table_name@year=2017/month=1/day=28"},{"partitionName":"default@table_name@year=2017/month=1/day=29"},{"partitionName":"default@table_name@year=2017/month=1/day=3"},{"partitionName":"default@table_name@year=2017/month=1/day=30"},{"partitionName":"default@table_name@year=2017/month=1/day=31"},{"partitionName":"default@table_name@year=2017/month=1/day=4"},{"partitionName":"default@table_name@year=2017/month=1/day=5"},{"partitionName":"default@table_name@year=2017/month=1/day=6"},{"partitionName":"default@table_name@year=2017/month=1/day=7"},{"partitionName":"default@table_name@year=2017/month=1/day=8"},{"partitionName":"default@table_name@year=2017/month=1/day=9"},{"partitionName":"default@table_name@year=2017/month=2/day=1"},{"partitionName":"default@table_name@year=2017/month=2/day=10"},{"partitionName":"default@table_name@year=2017/month=2/day=11"},{"partitionName":"default@table_name@year=2017/month=2/day=12"},{"partitionName":"default@table_name@year=2017/month=2/day=13"},{"partitionName":"default@table_name@year=2017/month=2/day=14"},{"partitionName":"default@table_name@year=2017/month=2/day=15"},{"partitionName":"default@table_name@year=2017/month=2/day=16"},{"partitionName":"default@table_name@year=2017/month=2/day=17"},{"partitionName":"default@table_name@year=2017/month=2/day=18"},{"partitionName":"default@table_name@year=2017/month=2/day=19"},{"partitionName":"default@table_name@year=2017/month=2/day=2"},{"partitionName":"default@table_name@year=2017/month=2/day=20"},{"partitionName":"default@table_name@year=2017/month=2/day=21"},{"partitionName":"default@table_name@year=2017/month=2/day=3"},{"partitionName":"default@table_name@year=2017/month=2/day=4"},{"partitionName":"default@table_name@year=2017/month=2/day=5"},{"partitionName":"default@table_name@year=2017/month=2/day=6"},{"partitionName":"default@table_name@year=2017/month=2/day=7"},{"partitionName":"default@table_name@year=2017/month=2/day=8"},{"partitionName":"default@table_name@year=2017/month=2/day=9"}],"input_tables":[{"tablename":"default@table_name","tabletype":"MANAGED_TABLE"}]}

explain dependency

select count (column_name)

from table_name

where year >= year (date_sub (current_date,7))
and month >= month (date_sub (current_date,7))
and day >= day (date_sub (current_date,7))
;

{"input_partitions":[{"partitionName":"default@table_name@year=2017/month=2/day=14"},{"partitionName":"default@table_name@year=2017/month=2/day=15"},{"partitionName":"default@table_name@year=2017/month=2/day=16"},{"partitionName":"default@table_name@year=2017/month=2/day=17"},{"partitionName":"default@table_name@year=2017/month=2/day=18"},{"partitionName":"default@table_name@year=2017/month=2/day=19"},{"partitionName":"default@table_name@year=2017/month=2/day=20"},{"partitionName":"default@table_name@year=2017/month=2/day=21"}],"input_tables":[{"tablename":"default@table_name","tabletype":"MANAGED_TABLE"}]}

Partition elimination in Greenplum

The Postgres manual page on Partitioning includes this caveat

Constraint exclusion only works when the query's WHERE clause contains constants (or externally supplied parameters). For example, a comparison against a non-immutable function such as CURRENT_TIMESTAMP cannot be optimized, since the planner cannot know which partition the function value might fall into at run time.

In order to eliminate a seek on a partition, Postgres must know when creating a query plan that no rows from that partition are relevant. In your query, this occurs only after the sub-query has completed, so the query would have to be split into two, with the second part planned only after the first completes.

If the partitions include an index on the partitioned column (PACKAGE_TYPE) as well as a constraint, the planner may elect to use an index scan on each partition, leading to the incorrect partitions being reasonably efficiently eliminated at runtime anyway. (That is, there would be 20 index scans, but each would require very little resource.)

An alternative would be to split the query yourself, and build the SQL dynamically. Since the SELECT PACKAGE_TYPE FROM PACKAGE_LIST_TABLE can only ever return up to 20 distinct values, you could select those into an array/set in your application or a user-defined function. Then you can pass these in as literals in the IN ( ... ) clause as in your first example (or equivalently = ANY(array_expression)), and achieve the partition elimination.

Bigquery apply subquery to partition time

You were getting the error because:

  1. the table has the option set: require_partition_filter=true, a query on the table should fail if no partition filter is specified.
  2. There is limitation on using subquery as the partition filter, the limitation is documented here.

In general, partition pruning will reduce query cost when the filters can be evaluated at the outset of the query without requiring any subquery evaluations or data scans.

The workaround is using BigQuery Scripting to pre-determine the partition filter, like:

DECLARE minimums DATE DEFAULT ((SELECT minimums FROM `Day` WHERE ...));
SELECT *
FROM `Day`
WHERE DATE (_PARTITIONTIME) > minimums; -- minimums is a constant to the second query

Partitioned-View not working with parameters

Generally speaking, OR predicates are challenging for SQL Server to optimize and generate a reusable cached plan.

I ran the query with the OPTION(RECOMPILE) query hint and the actual execution plan shows unneeded partitioned view member tables are statically eliminated from the plan. Not sure why sqlfiddle doesn't show this (it's currently using SQL 2014 RTM) but I observed elimination with all versions of SQL Server from 2012 through 2017 RC2 with latest service packs installed.

DECLARE @MonthA int = 5
DECLARE @MonthB int = 6;

SELECT *
FROM Year1998Sales
WHERE (OrderMonth = @MonthA OR OrderMonth = @MonthB) AND CustomerID = 64892
OPTION(RECOMPILE);

Here's the actual execution plan XML (SQL Server 2017 RC2):

<?xml version="1.0" encoding="utf-16"?>
<ShowPlanXML xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" Version="1.6" Build="14.0.900.75" xmlns="http://schemas.microsoft.com/sqlserver/2004/07/showplan">
<BatchSequence>
<Batch>
<Statements>
<StmtSimple StatementCompId="3" StatementEstRows="2" StatementId="1" StatementOptmLevel="FULL" StatementOptmEarlyAbortReason="GoodEnoughPlanFound" CardinalityEstimationModelVersion="140" StatementSubTreeCost="0.00656736" StatementText="SELECT * FROM Year1998Sales WHERE (OrderMonth = @MonthA OR OrderMonth = @MonthB) AND CustomerID = 64892 OPTION(RECOMPILE)" StatementType="SELECT" QueryHash="0xF9DB04D00D56A43D" QueryPlanHash="0x6171395FA7A2F92C" RetrievedFromCache="false" SecurityPolicyApplied="false">
<StatementSetOptions ANSI_NULLS="true" ANSI_PADDING="true" ANSI_WARNINGS="true" ARITHABORT="true" CONCAT_NULL_YIELDS_NULL="true" NUMERIC_ROUNDABORT="false" QUOTED_IDENTIFIER="true" />
<QueryPlan DegreeOfParallelism="1" CachedPlanSize="24" CompileTime="8" CompileCPU="8" CompileMemory="552">
<MemoryGrantInfo SerialRequiredMemory="0" SerialDesiredMemory="0" />
<OptimizerHardwareDependentProperties EstimatedAvailableMemoryGrant="419405" EstimatedPagesCached="104851" EstimatedAvailableDegreeOfParallelism="2" MaxCompileMemory="9985376" />
<QueryTimeStats CpuTime="0" ElapsedTime="0" />
<RelOp AvgRowSize="35" EstimateCPU="2E-07" EstimateIO="0" EstimateRebinds="0" EstimateRewinds="0" EstimatedExecutionMode="Row" EstimateRows="2" LogicalOp="Concatenation" NodeId="0" Parallel="false" PhysicalOp="Concatenation" EstimatedTotalSubtreeCost="0.00656736">
<OutputList>
<ColumnReference Column="Union1014" />
<ColumnReference Column="Union1015" />
<ColumnReference Column="Union1016" />
<ColumnReference Column="Union1017" />
<ColumnReference Column="Union1018" />
</OutputList>
<RunTimeInformation>
<RunTimeCountersPerThread Thread="0" ActualRows="0" Batches="0" ActualEndOfScans="1" ActualExecutions="1" ActualExecutionMode="Row" ActualElapsedms="0" ActualCPUms="0" />
</RunTimeInformation>
<Concat>
<DefinedValues>
<DefinedValue>
<ColumnReference Column="Union1014" />
<ColumnReference Database="[Repro]" Schema="[dbo]" Table="[May1998sales]" Column="OrderID" />
<ColumnReference Database="[Repro]" Schema="[dbo]" Table="[Jun1998sales]" Column="OrderID" />
</DefinedValue>
<DefinedValue>
<ColumnReference Column="Union1015" />
<ColumnReference Database="[Repro]" Schema="[dbo]" Table="[May1998sales]" Column="CustomerID" />
<ColumnReference Database="[Repro]" Schema="[dbo]" Table="[Jun1998sales]" Column="CustomerID" />
</DefinedValue>
<DefinedValue>
<ColumnReference Column="Union1016" />
<ColumnReference Database="[Repro]" Schema="[dbo]" Table="[May1998sales]" Column="OrderDate" />
<ColumnReference Database="[Repro]" Schema="[dbo]" Table="[Jun1998sales]" Column="OrderDate" />
</DefinedValue>
<DefinedValue>
<ColumnReference Column="Union1017" />
<ColumnReference Database="[Repro]" Schema="[dbo]" Table="[May1998sales]" Column="OrderMonth" />
<ColumnReference Database="[Repro]" Schema="[dbo]" Table="[Jun1998sales]" Column="OrderMonth" />
</DefinedValue>
<DefinedValue>
<ColumnReference Column="Union1018" />
<ColumnReference Database="[Repro]" Schema="[dbo]" Table="[May1998sales]" Column="DeliveryDate" />
<ColumnReference Database="[Repro]" Schema="[dbo]" Table="[Jun1998sales]" Column="DeliveryDate" />
</DefinedValue>
</DefinedValues>
<RelOp AvgRowSize="35" EstimateCPU="0.0001581" EstimateIO="0.003125" EstimateRebinds="0" EstimateRewinds="0" EstimatedExecutionMode="Row" EstimateRows="1" EstimatedRowsRead="1" LogicalOp="Clustered Index Scan" NodeId="1" Parallel="false" PhysicalOp="Clustered Index Scan" EstimatedTotalSubtreeCost="0.0032831" TableCardinality="0">
<OutputList>
<ColumnReference Database="[Repro]" Schema="[dbo]" Table="[May1998sales]" Column="OrderID" />
<ColumnReference Database="[Repro]" Schema="[dbo]" Table="[May1998sales]" Column="CustomerID" />
<ColumnReference Database="[Repro]" Schema="[dbo]" Table="[May1998sales]" Column="OrderDate" />
<ColumnReference Database="[Repro]" Schema="[dbo]" Table="[May1998sales]" Column="OrderMonth" />
<ColumnReference Database="[Repro]" Schema="[dbo]" Table="[May1998sales]" Column="DeliveryDate" />
</OutputList>
<RunTimeInformation>
<RunTimeCountersPerThread Thread="0" ActualRows="0" Batches="0" ActualEndOfScans="1" ActualExecutions="1" ActualExecutionMode="Row" ActualElapsedms="0" ActualCPUms="0" ActualScans="1" ActualLogicalReads="0" ActualPhysicalReads="0" ActualReadAheads="0" ActualLobLogicalReads="0" ActualLobPhysicalReads="0" ActualLobReadAheads="0" />
</RunTimeInformation>
<IndexScan Ordered="false" ForcedIndex="false" ForceScan="false" NoExpandHint="false" Storage="RowStore">
<DefinedValues>
<DefinedValue>
<ColumnReference Database="[Repro]" Schema="[dbo]" Table="[May1998sales]" Column="OrderID" />
</DefinedValue>
<DefinedValue>
<ColumnReference Database="[Repro]" Schema="[dbo]" Table="[May1998sales]" Column="CustomerID" />
</DefinedValue>
<DefinedValue>
<ColumnReference Database="[Repro]" Schema="[dbo]" Table="[May1998sales]" Column="OrderDate" />
</DefinedValue>
<DefinedValue>
<ColumnReference Database="[Repro]" Schema="[dbo]" Table="[May1998sales]" Column="OrderMonth" />
</DefinedValue>
<DefinedValue>
<ColumnReference Database="[Repro]" Schema="[dbo]" Table="[May1998sales]" Column="DeliveryDate" />
</DefinedValue>
</DefinedValues>
<Object Database="[Repro]" Schema="[dbo]" Table="[May1998sales]" Index="[May1998sales_OrderIDMonth]" IndexKind="Clustered" Storage="RowStore" />
<Predicate>
<ScalarOperator ScalarString="[Repro].[dbo].[May1998sales].[CustomerID]=(64892)">
<Compare CompareOp="EQ">
<ScalarOperator>
<Identifier>
<ColumnReference Database="[Repro]" Schema="[dbo]" Table="[May1998sales]" Column="CustomerID" />
</Identifier>
</ScalarOperator>
<ScalarOperator>
<Const ConstValue="(64892)" />
</ScalarOperator>
</Compare>
</ScalarOperator>
</Predicate>
</IndexScan>
</RelOp>
<RelOp AvgRowSize="35" EstimateCPU="0.0001581" EstimateIO="0.003125" EstimateRebinds="0" EstimateRewinds="0" EstimatedExecutionMode="Row" EstimateRows="1" EstimatedRowsRead="1" LogicalOp="Clustered Index Scan" NodeId="2" Parallel="false" PhysicalOp="Clustered Index Scan" EstimatedTotalSubtreeCost="0.0032831" TableCardinality="0">
<OutputList>
<ColumnReference Database="[Repro]" Schema="[dbo]" Table="[Jun1998sales]" Column="OrderID" />
<ColumnReference Database="[Repro]" Schema="[dbo]" Table="[Jun1998sales]" Column="CustomerID" />
<ColumnReference Database="[Repro]" Schema="[dbo]" Table="[Jun1998sales]" Column="OrderDate" />
<ColumnReference Database="[Repro]" Schema="[dbo]" Table="[Jun1998sales]" Column="OrderMonth" />
<ColumnReference Database="[Repro]" Schema="[dbo]" Table="[Jun1998sales]" Column="DeliveryDate" />
</OutputList>
<RunTimeInformation>
<RunTimeCountersPerThread Thread="0" ActualRows="0" Batches="0" ActualEndOfScans="1" ActualExecutions="1" ActualExecutionMode="Row" ActualElapsedms="0" ActualCPUms="0" ActualScans="1" ActualLogicalReads="0" ActualPhysicalReads="0" ActualReadAheads="0" ActualLobLogicalReads="0" ActualLobPhysicalReads="0" ActualLobReadAheads="0" />
</RunTimeInformation>
<IndexScan Ordered="false" ForcedIndex="false" ForceScan="false" NoExpandHint="false" Storage="RowStore">
<DefinedValues>
<DefinedValue>
<ColumnReference Database="[Repro]" Schema="[dbo]" Table="[Jun1998sales]" Column="OrderID" />
</DefinedValue>
<DefinedValue>
<ColumnReference Database="[Repro]" Schema="[dbo]" Table="[Jun1998sales]" Column="CustomerID" />
</DefinedValue>
<DefinedValue>
<ColumnReference Database="[Repro]" Schema="[dbo]" Table="[Jun1998sales]" Column="OrderDate" />
</DefinedValue>
<DefinedValue>
<ColumnReference Database="[Repro]" Schema="[dbo]" Table="[Jun1998sales]" Column="OrderMonth" />
</DefinedValue>
<DefinedValue>
<ColumnReference Database="[Repro]" Schema="[dbo]" Table="[Jun1998sales]" Column="DeliveryDate" />
</DefinedValue>
</DefinedValues>
<Object Database="[Repro]" Schema="[dbo]" Table="[Jun1998sales]" Index="[Jun1998sales_OrderIDMonth]" IndexKind="Clustered" Storage="RowStore" />
<Predicate>
<ScalarOperator ScalarString="[Repro].[dbo].[Jun1998sales].[CustomerID]=(64892)">
<Compare CompareOp="EQ">
<ScalarOperator>
<Identifier>
<ColumnReference Database="[Repro]" Schema="[dbo]" Table="[Jun1998sales]" Column="CustomerID" />
</Identifier>
</ScalarOperator>
<ScalarOperator>
<Const ConstValue="(64892)" />
</ScalarOperator>
</Compare>
</ScalarOperator>
</Predicate>
</IndexScan>
</RelOp>
</Concat>
</RelOp>
</QueryPlan>
</StmtSimple>
</Statements>
</Batch>
</BatchSequence>
</ShowPlanXML>


Related Topics



Leave a reply



Submit