Why Isn't Postgres Using the Index

Why isn't postgresql using an index with my group by aggregate?

As @Pavel Stehule mentions in his answer, Postgres does not implement index skip scans, which are necessary to optimize these types of queries. Timescaledb recognized that these types of queries are really helpful in timeseries analysis, so they implemented an index skip scan themselves. It is present in their extension from version 2.2.1 onward, see their blog post about it here.

After upgrading the extension to >= 2.2.1, the query can be rewritten to use the index skip scan:

select distinct on (device) device, time from metrics order by device, time desc

This then uses their index skip scan implementation, and in my case sped up the query by around 100x.

Why isn't Postgres using the index?

Because of:

Seq Scan on invoices  (...) (actual ... rows=118027 <— this
Filter: (account_id = 1)
Rows Removed by Filter: 51462 <— vs this
Total runtime: 39.917 ms

You're selecting so many rows that it's cheaper to read the entire table.

Related earlier questions and answers from today for further reading:

  • Why doesn't Postgresql use index for IN query?

  • Postgres using wrong index when querying a view of indexed expressions?

(See also Craig's longer answer on the second one for additional notes on indexes subtleties.)

Why isn't PostgreSQL using index for this join query

I concur with jjanes' answer, but I want to suggest these additional experiments:

  • Try to ANALYZE event_user_detail; and see if that improves the estimate.

  • It could be that random_page_cost is set too high: it is designed for spinning disks and estimates index scans as comparatively expensive. If you lower that parameter, PostgreSQL will be more ready to use index scans.

PostgreSQL Index Isn't Used on Query

If you read the execution plan closely, you'll see that Postgres is telling you that out of about 6 million records, 5.5 million matched (> 90%). Based on statistics, it is likely that Postgres realized that it would be returning a large percentage of total records in the table, and that it would be faster to just forgo using the index and scan the entire table.

The concept to understand here is that, while the index you defined does potentially let Postgres throw away non matching records very quickly, it actually increases the time needed to lookup the values in SELECT *. The reason for this is that, upon hitting the leaf node in the index, Postgres must then do a seek back to the clustered index/table to find the column values. Assuming your query would return most of the table, it would be faster to just scan the table directly.

This being said, there is nothing at all inherently wrong with your index. If your query used a more narrow range, or searched for a specific timestamp, such that the expected result set were sufficiently small, then Postgres likely would have used the index.



Related Topics



Leave a reply



Submit