Indexed Order by with Limit 1

Indexed ORDER BY with LIMIT 1

Assuming we are dealing with a big table, a partial index might help:

CREATE INDEX tbl_created_recently_idx ON tbl (created_at DESC)
WHERE created_at > '2013-09-15 0:0'::timestamp;

As you already found out: descending or ascending hardly matters here. Postgres can scan backwards at almost the same speed (exceptions apply with multi-column indices).

Query to use this index:

SELECT * FROM tbl
WHERE created_at > '2013-09-15 0:0'::timestamp -- matches index
ORDER BY created_at DESC
LIMIT 1;

The point here is to make the index much smaller, so it should be easier to cache and maintain.

  1. You need to pick a timestamp that is guaranteed to be smaller than the most recent one.
  2. You should recreate the index from time to time to cut off old data.
  3. The condition needs to be IMMUTABLE.

So the one-time effect deteriorates over time. The specific problem is the hard coded condition:

WHERE created_at > '2013-09-15 0:0'::timestamp

Automate

You could update the index and your queries manually from time to time. Or you automate it with the help of a function like this one:

CREATE OR REPLACE FUNCTION f_min_ts()
RETURNS timestamp LANGUAGE sql IMMUTABLE AS
$$SELECT '2013-09-15 0:0'::timestamp$$

Index:

CREATE INDEX tbl_created_recently_idx ON tbl (created_at DESC);
WHERE created_at > f_min_ts();

Query:

SELECT * FROM tbl
WHERE created_at > f_min_ts()
ORDER BY created_at DESC
LIMIT 1;

Automate recreation with a cron job or some trigger-based event. Your queries can stay the same now. But you need to recreate all indices using this function in any way after changing it. Just drop and create each one.

First ..

... test whether you are actually hitting the bottle neck with this.

Try whether a simple DROP index ... ; CREATE index ... does the job. Then your index might have been bloated. Your autovacuum settings may be off.

Or try VACUUM FULL ANALYZE to get your whole table plus indices in pristine condition and check again.

Other options include the usual general performance tuning and covering indexes, depending on what you actually retrieve from the table.

Why does ORDER BY and LIMIT 1 slow down a MySQL query so much?

I think MySQL is trying to use mailing_CS index in last query and this index is not optimal.

Try this query:

SELECT * 
FROM `mydata`.`mytable` USE INDEX (section_CS) IGNORE INDEX(mailing_CS)
WHERE (
(token = 'XFRA1NMDU9XY') AND
(section = 210874)
)
ORDER BY mailing
LIMIT 1

Also you may use composite index (section, mailing) for this table.

MySQL SELECT MAX vs ORDER BY LIMIT 1 on WHERE clause

Compare in the analyze:

    -> Index scan on table using PRIMARY (reverse)  (cost=0.10 rows=2) (actual time=0.036..0.036 rows=1 loops=1)

Versus:

    -> Index range scan on table using open_time  (cost=217786.76 rows=1081033) (actual time=0.031..705.926 rows=2180645 loops=1)

Examining 2 rows must be a lot quicker than examining 2,180,645 rows.

In the query with ORDER BY id DESC LIMIT 1, it uses the primary key index. It starts at the end because it's a reverse order. Then it just iterates down the leaf nodes of the index (in descending order), until it examines the first row that also matches open_time > 0. Then the LIMIT optimization allows the query execution to finish. Based on its statistics, it estimates this will happen after examining 2 rows.

In the query with MAX(id), it uses the index on open_time. But because it's a range condition open_time > 0, it can't assume the maximum id is found at the start or end of that range. So it must examine every matching entry in the open_time index, searching for the greatest value of id (primary keys are implicitly part of a secondary index). There's no chance of early-termination as there is in the query with LIMIT.

PostgreSQL - ORDER BY with LIMIT not using indexes as expected

Upgrading to Postgres 13 fixed this for us, with the introduction of incremental sort. From some docs on the feature:

Incremental sorting: Sorting is a performance-intensive task, so every improvement in this area can make a difference. Now PostgreSQL 13 introduces incremental sorting, which leverages early-stage sorts of a query and sorts only the incremental unsorted fields, increasing the chances the sorted block will fit in memory and by that, improving performance.

The new query plan from EXPLAIN is as follows, with the query now completing in <500ms consistently:

QUERY PLAN
Limit (cost=71.06..820.32 rows=5000 width=20)
-> Incremental Sort (cost=71.06..15461.82 rows=102706 width=20)
" Sort Key: ed.event_id, ed.version"
Presorted Key: ed.event_id
-> Nested Loop (cost=0.84..6659.05 rows=102706 width=20)
-> Index Only Scan using event_id_version on deltas_to_retrieve zz (cost=0.28..1116.39 rows=541 width=20)
-> Index Only Scan using event_deltas_pkey on event_deltas ed (cost=0.56..8.35 rows=190 width=20)
Index Cond: ((event_id = zz.event_id) AND (version > zz.version))

mysql: very simple SELECT id ORDER BY LIMIT will not use INDEX as expected (?!)

Index lookups are by value, not by position. An index can search for a value 2955900, but you're not asking for that. You're asking for the query to start at an offset of the 2955900th row in the table.

The optimizer can't assume that all primary key values are consecutive. So it's pretty likely that the 2955900th row has a value much higher than that.

Even if the primary key values are consecutive, you might have a WHERE condition that only matches, for example, 45% of the rows. In which case the id value on the 2955900th row would be way past the id value 2955900.

In other words, an index lookup of the id value 2955900 will not deliver the 2955900th row.

So MySQL can't use the index for a limit's offset. It must scan the rows to count them until it reaches offset+limit rows.

MySQL does have optimizations related to LIMIT, but it's more about stopping a table-scan once it has reached the number of rows to return. The optimizer may still report in an EXPLAIN plan that it expects it might have to scan the whole table.

A frequent misunderstand about FORCE INDEX is that it forces the use of an index. :-)
In fact, if the query can't use an index (or if the available indexes don't have any benefit for this query), FORCE INDEX has no effect.


Re your comment:

Pagination is a frequent bane of data-driven web applications. Despite how common this feature is, it's not easy to optimize. Here are a few tips:

  • Why are you querying with offset 2955900? Do you really expect users to sift through that many pages? Most users give up after a few pages (exactly how many depends on the type of application and the data).

  • Reduce the number of queries. Your pagination function could fetch the first 5-10 pages, even if only it shows the first page to the user. Cache the other pages, with the assumption that the user will advance through a few pages. Only if they advance past the cached set of pages does your app have to do another query. You could even cache all 10 pages in Javascript on the client's browser so clicking "Next" is instantaneous for them (at least for those first few pages).

  • Don't put a "Last" button on any user interface, because people will click it out of curiosity. Notice Google has a "Next" button but not a "Last" button. So the UI itself discourages people from running inefficient queries with high offsets.

  • If the user is advancing one page at a time, use the highest id value returned in the previous page in the WHERE clause of the next page's query. I.e. the following does use the index, even with no FORCE INDEX hint:

    SELECT * FROM thistable WHERE id > 544 LIMIT 20

MySQL is not using composite index with ORDER BY clause

I'd think that optimizer would select the composite index as you expected. (But it's not in your database)

I tested the same situation on my test DB, but it selects the composite index.

Fortunately, there is an index hint in MySQL for optimizer decisions.

tbl_name [[AS] alias] [index_hint_list]

index_hint_list:
index_hint [index_hint] ...

index_hint:
USE {INDEX|KEY}
[FOR {JOIN|ORDER BY|GROUP BY}] ([index_list])
| {IGNORE|FORCE} {INDEX|KEY}
[FOR {JOIN|ORDER BY|GROUP BY}] (index_list)

index_list:
index_name [, index_name] ...

Example:

SELECT * FROM table1 USE INDEX (col1_index,col2_index)
WHERE col1=1 AND col2=2 AND col3=3;

SELECT * FROM table1 IGNORE INDEX (col3_index)
WHERE col1=1 AND col2=2 AND col3=3;

Finally, could you try to run your SQL with the following hint?

select
*
from
`user` USE INDEX (your_composit_index_name)
where org_id = "some id"
ORDER BY first_name asc,
last_name asc
limit 100;

Edit 1: Index fix

Please fix your index. Your key lengths are defined as 32 in index idx_first_name_last_name, but they should be 255 lengths.

ALTER TABLE `user` DROP INDEX `idx_first_name_last_name`, ADD KEY `idx_first_name_last_name` (`first_name`, `last_name`); 

MySQL group by & order by & limit optimization

Your query must do 2 sorts before delivering only 15 rows.

Technically, the GROUP BY book_id is invalid. Which updated_at do you want for each book_id? There can clearly be multiple rows for each book_id.

What does updated_at mean? Perhaps it is when the row in this many:many mapping table was updated?

Perhaps you really wanted

SELECT  `book_id`
FROM `book_buyer`
ORDER BY `updated_at` DESC, `book_id` DESC
LIMIT 15;

That would benefit from

INDEX(updated_at, book_id)

Note that the two columns in the INDEX are both in the same direction (DESC) and in the same order (updated_at first)to match theORDER BY. Both of these are necessary for the INDEX` to be used. (There is a minor improvement in MySQL 8.0.)

In fact, instead of reading the entire table and doing two sorts (or maybe just one), my Select will read 15 rows from the index and no rows from the table itself. Much faster.



Related Topics



Leave a reply



Submit