PostgreSQL does not use a partial index
A partial index is a good idea to exclude half the rows of the table which you obviously do not need. Simpler:
CREATE INDEX name_idx ON table (text_col)
WHERE text_col IS NOT NULL;
Be sure to run ANALYZE table
after creating the index. (Autovacuum does that automatically after some time if you don't do it manually, but if you test right after creation, your test will fail.)
Then, to convince the query planner that a particular partial index can be used, repeat the WHERE
condition in the query - even if it seems completely redundant:
SELECT col1,col2, .. colN
FROM table
WHERE text_col = 'my_value'
AND text_col IS NOT NULL; -- repeat condition
Voilá.
Per documentation:
However, keep in mind that the predicate must match the conditions
used in the queries that are supposed to benefit from the index. To be
precise, a partial index can be used in a query only if the system can
recognize that theWHERE
condition of the query mathematically implies
the predicate of the index. PostgreSQL does not have a sophisticated
theorem prover that can recognize mathematically equivalent
expressions that are written in different forms. (Not only is such a
general theorem prover extremely difficult to create, it would
probably be too slow to be of any real use.) The system can recognize
simple inequality implications, for example "x < 1" implies "x < 2";
otherwise the predicate condition must exactly match part of the
query'sWHERE
condition or the index will not be recognized as usable.
Matching takes place at query planning time, not at run time. As a
result, parameterized query clauses do not work with a partial index.
As for parameterized queries: again, add the (redundant) predicate of the partial index as an additional, constant WHERE
condition, and it works just fine.
An important update in Postgres 9.6 largely improves chances for index-only scans (which can make queries cheaper and the query planner will more readily chose such query plans). Related:
- PostgreSQL not using index during count(*)
Postgres is not using partial index
No rows are removed by the filter in the sequential scan, so it seems that all rows in the table have refuel_request_status_id
equal to '1'
, '2'
or '3'
. So the WHERE
condition in the index doesn't make a difference.
You could try with enable_seqscan = off
to see if PostgreSQL made the right choice. If not, perhaps random_page_cost
is not correctly set for your hardware.
If it turns out that the sequential scan is indeed the fastest way to go, there is nothing much you can do to speed up the query: you can get faster storage or more RAM to cache the table.
Counting is slow business.
PostgreSQL doesn't consistently use partial index
The thing here that is actually slow is that =ANY
is implemented by looping over the 1000 members of your array and testing each one, and doing that for each of the 136000 rows it needs to inspect. That is a lot looping (but not 4 seconds worth in my hands, "only" 1.5s for me). Worse, the planner doesn't anticipate that the =ANY
has such a poor implementation, so sees no reason to choose the other plan to avoid it.
v14 will fix this by using a hash table to implement the =ANY
, so it will no longer be so inordinately slow.
If you can't/don't want to upgrade to v14, you could rewrite the query by joining to a VALUES list, rather than using =ANY
SELECT * FROM enrollments join (VALUES ('C1'),...,('C1000')) f(c) on c=consumer_id
WHERE deleted_at IS NULL AND portfolio_id = 1
Postgres partial vs regular / full index on nullable column
It all depends.
NULL
values are included in (default) B-tree indexes since version Postgres 8.3, like Schwern provided. However, predicates like the one you mention (where colA is not null
) are only properly supported since Postgres 9.0. The release notes:
Allow
IS NOT NULL
restrictions to use indexes (Tom Lane)This is particularly useful for finding
MAX()
/MIN()
values in
indexes that contain many null values.
GIN indexes followed later:
As of PostgreSQL 9.1, null key values can be included in the index.
Typically, a partial index makes sense if it excludes a major part of the table from the index, making it substantially smaller and saving writes to the index. Since B-tree indexes are so shallow, bare seek performance scales fantastically (once the index is cached). 10 % fewer index entries hardly matter in that area.
Your case would exclude only around 10% of all rows, and that rarely pays. A partial index adds some overhead for the query planner and excludes queries that don't match the index condition. (The Postgres query planner doesn't try hard if the match is not immediately obvious.)
OTOH, Postgres will rarely use an index for predicates retrieving 10 % of the table - a sequential scan will typically be faster. Again, it depends.
If (almost) all queries exclude NULL
anyway (in a way the Postgres planner understands), then a partial index excluding only 10 % of all rows is still a sensible option. But it may backfire if query patterns change. The added complexity may not be worth it.
Also worth noting that there are still corner cases with NULL
values in Postgres indexes. I bumped into this case recently where Postgres proved unwilling to read sorted rows from a multicolumn index when the first index expression was filtered with IS NULL
(making a partial index preferable for the case):
db<>fiddle here
So, it depends on the complete picture.
Postgres IN clause with many values not using partial index
You are running an outdated version of Postgres. Consider upgrading soon.
There are many possible reasons. I suspect a weakness in the selectivity estimation of the outdated version. I vaguely remember a maximum of 100 values for query planning involving arrays that was later improved. IN
expressions are typically transformed to = ANY (ARRAY[...]
) internally:
- How to use ANY instead of IN in a WHERE clause with Rails?
Either way, you might be able to fix the behavior by repeating the predicate of the partial index in your query:
SELECT * FROM _order
WHERE merged_id IN ( 10001 ,10002 ,10003 ....., 11000)
AND merged_id is not null; -- logically redundant
There may be other problems with your server configuration like cost settings or table statistics:
- Keep PostgreSQL from sometimes choosing a bad query plan
And don't forget to run ANALYZE
on your table at least once after creating the partial index. Or, preferably, VACUUM ANALYZE
, but that's more expensive for your big table.
- PostgreSQL partial index unused when created on a table with existing data
However, for long lists of values, there are more efficient query variants to begin with:
SELECT o.*
FROM unnest('{10001 ,10002 ,10003 ....., 11000}'::int8[]) merged_id
JOIN _order o USING (merged_id);
See:
- Optimizing a Postgres query with a large IN
Partial index not used on count
The partial index may be smaller, but the size of the part of the index that needs to be read will be about the same between the two indexes. Skipping the parts of the complete index which are for the wrong "status" will be very efficient.
Your table is obviously not very well vacuumed, based on both the heap fetches and the number of buffers dirtied and written. Try vacuuming the table and then repeating the queries.
Partial Index on Postgres where condition is in separate table
Not possible. From the manual:
The expression used in the WHERE clause can refer only to columns of
the underlying table, but it can use all columns, not just the ones
being indexed.
Indexing the condition of a partial index in Postgres
A b-tree index can be thought of an ordered list of index entries, each with a pointer to a row in the table.
In a partial index, the list is just smaller: there are only index entries for rows that meet the condition.
If you have the index condition in your WHERE
clause, PostgreSQL knows it can use the index and doesn't have to check the index condition, because it will be satisfied automatically.
So:
No, any row found via the index will automatically satisfy the index condition, so using the index is enough to make sure it is satisfied.
No, an index on a boolean column will not be used, because it would not be cheaper than this partial index, and the partial index can be used to check the condition on
order_nr
as well.It is actually the other way around: the partial index could well be used for queries that only have the
boolean
column in theWHERE
condition, if there are few enough rows that satisfy the condition.
Related Topics
Inner Join with Count() on Three Tables
How to Import Excel Files with Different Names and Same Schema into Database
Entity Framework Hitting 2100 Parameter Limit
Oracle Db Equivalent of on Duplicate Key Update
Any Reason for Group by Clause Without Aggregation Function
How to Find the Total Number of Used Days in a Month
Split String into Table Given Row Delimiter and Column Delimiter in SQL Server
Bind a Column Default Value to a Function in SQL 2005
SQL Server Rounding Error, Giving Different Values
Instead of Null How to Show '0' in Result with Select Statement SQL
SQL Query to Search for Room Availability
Recursive Stored Functions in MySQL
How to Add 10 Seconds in Current_Timestamp SQL ( Oracle )
How to Find the .Net Framework Version Used in an Ssis 2008 R2 Package