For Autoincrement Fields: Max(Id) VS Top 1 Id Order by Id Desc

For autoincrement fields: MAX(ID) vs TOP 1 ID ORDER BY ID DESC

In theory, they will use same plans and run almost same time.

In practice,

SELECT TOP 1 Id FROM Table1 ORDER BY Id DESC

will more probably use a PRIMARY KEY INDEX.

Also, this one is more extendable if you will decide to select some else column along with id.

An actual plan on MAX() says:

SELECT <- AGGREGATE <- TOP <- CLUSTERED INDEX SCAN

, while plan for TOP 1 says:

SELECT <- TOP <- CLUSTERED INDEX SCAN

, i. e. aggregate is omitted.

Aggregate actually won't do anything here, as there is but one row.

P. S. As @Mehrdad Afshari and @John Sansom noted, on a non-indexed field MAX is slightly faster (of course not 20 times as optimizer says):


-- 18,874,368 rows

SET LANGUAGE ENGLISH
SET STATISTICS TIME ON
SET STATISTICS IO ON
PRINT 'MAX'
SELECT MAX(id) FROM master
PRINT 'TOP 1'
SELECT TOP 1 id FROM master ORDER BY id DESC
PRINT 'MAX'
SELECT MAX(id) FROM master
PRINT 'TOP 1'
SELECT TOP 1 id FROM master ORDER BY id DESC
PRINT 'MAX'
SELECT MAX(id) FROM master
PRINT 'TOP 1'
SELECT TOP 1 id FROM master ORDER BY id DESC
PRINT 'MAX'
SELECT MAX(id) FROM master
PRINT 'TOP 1'
SELECT TOP 1 id FROM master ORDER BY id DESC
PRINT 'MAX'
SELECT MAX(id) FROM master
PRINT 'TOP 1'
SELECT TOP 1 id FROM master ORDER BY id DESC

Changed language setting to us_english.

SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 1 ms.

SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 1 ms.

SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 1 ms.
MAX

SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 20 ms.

(строк обработано: 1)
Table 'master'. Scan count 3, logical reads 32655, physical reads 0, read-ahead reads 447, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
CPU time = 5452 ms, elapsed time = 2766 ms.
TOP 1

SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 1 ms.

(строк обработано: 1)
Table 'master'. Scan count 3, logical reads 32655, physical reads 0, read-ahead reads 2, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
CPU time = 6813 ms, elapsed time = 3449 ms.
MAX

SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 1 ms.

(строк обработано: 1)
Table 'master'. Scan count 3, logical reads 32655, physical reads 0, read-ahead reads 44, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
CPU time = 5359 ms, elapsed time = 2714 ms.
TOP 1

SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 1 ms.

(строк обработано: 1)
Table 'master'. Scan count 3, logical reads 32655, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
CPU time = 6766 ms, elapsed time = 3379 ms.
MAX

SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 1 ms.

(строк обработано: 1)
Table 'master'. Scan count 3, logical reads 32655, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
CPU time = 5406 ms, elapsed time = 2726 ms.
TOP 1

SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 1 ms.

(строк обработано: 1)
Table 'master'. Scan count 3, logical reads 32655, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
CPU time = 6780 ms, elapsed time = 3415 ms.
MAX

SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 1 ms.

(строк обработано: 1)
Table 'master'. Scan count 3, logical reads 32655, physical reads 0, read-ahead reads 85, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
CPU time = 5392 ms, elapsed time = 2709 ms.
TOP 1

SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 1 ms.

(строк обработано: 1)
Table 'master'. Scan count 3, logical reads 32655, physical reads 0, read-ahead reads 10, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
CPU time = 6766 ms, elapsed time = 3387 ms.
MAX

SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 1 ms.

(строк обработано: 1)
Table 'master'. Scan count 3, logical reads 32655, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
CPU time = 5374 ms, elapsed time = 2708 ms.
TOP 1

SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 1 ms.

(строк обработано: 1)
Table 'master'. Scan count 3, logical reads 32655, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
CPU time = 6797 ms, elapsed time = 3494 ms.

Can I assume the highest auto increment value is always the newest entry?

Will SELECT MAX(ID) FROM ... always work?

No, it will not. If you have more than one copy of your program connected to your database, and one program does what you suggest, and another program then performs an insert, the first program's MAX(ID) value will become stale. It's a race condition. Those are very difficult to detect and debug, because they don't crop up until a system is under heavy load.

If all you care about is a snapshot of MAX(ID) you'll be OK. But you cannot use this MAX(ID) value as the value for the next insertion, unless you never will have more than one program instance inserting data. It's unwise to rely on that kind of never.

If not, is there a more reliable way to get the newest entry?

Immediately after you do an INSERT operation you can do SELECT LAST_INSERT_ID() to retrieve the autoincrementing ID just used. Various APIs, like mysqli, PDO, and JDBC return this ID to a calling program when an INSERT operation completes.

LAST_INSERT_ID() and the related API functions work on a connection-by-connection basis. Therefore, if multiple connections are inserting rows, each connection will get its own id values.

Count(*) Vs. Max(Id)

If you insert 10 rows, delete 5, then insert 10 more then your COUNT(*) will not match MAX(id).

You can also insert an id way ahead of where it should be, like in an empty table INSERT ... (id) VALUES (9000000) will kick up your MAX(id) significantly despite having only 1 row.

Rolled-back transactions can also interfere with this.

If you want to know the next increment, check the AUTO_INCREMENT value, but be aware that this is only a guess, the actual value used may differ by the time you actually get around to inserting.

If you want them to match then you need to:

  • Start with a table where AUTO_INCREMENT=1, as in it's either brand new or has been cleared with TRUNCATE.
  • Insert using auto-generated id values as one transaction, or as a series of transactions where all of them have been fully committed.

MySQL SELECT MAX vs ORDER BY LIMIT 1 on WHERE clause

Compare in the analyze:

    -> Index scan on table using PRIMARY (reverse)  (cost=0.10 rows=2) (actual time=0.036..0.036 rows=1 loops=1)

Versus:

    -> Index range scan on table using open_time  (cost=217786.76 rows=1081033) (actual time=0.031..705.926 rows=2180645 loops=1)

Examining 2 rows must be a lot quicker than examining 2,180,645 rows.

In the query with ORDER BY id DESC LIMIT 1, it uses the primary key index. It starts at the end because it's a reverse order. Then it just iterates down the leaf nodes of the index (in descending order), until it examines the first row that also matches open_time > 0. Then the LIMIT optimization allows the query execution to finish. Based on its statistics, it estimates this will happen after examining 2 rows.

In the query with MAX(id), it uses the index on open_time. But because it's a range condition open_time > 0, it can't assume the maximum id is found at the start or end of that range. So it must examine every matching entry in the open_time index, searching for the greatest value of id (primary keys are implicitly part of a secondary index). There's no chance of early-termination as there is in the query with LIMIT.

What to do if the auto-increment value reaches its limit?

Let's assume a table structure like:

CREATE TABLE `tbl` (
`id` INT(11) UNSIGNED NOT NULL AUTO_INCREMENT,
PRIMARY KEY (`id`)
);

and INSERT queries like:

INSERT INTO tbl(id) VALUES (NULL);

In the real code there are also other columns in the table and they are also present in the INSERT query but we can safely ignore them because they don't bring any value to this specific issue.

When the value of column id reaches its maximum value no more rows can be inserted in the table using the query above. The next INSERT fails with the error:

SQL Error (167): Out of range value for column 'id'.

If there are gaps in the values of the id column then you can still insert rows that use values not present in the table but you have to specify the values for id in the INSERT query.


Anyway, if the type of your AUTO_INCREMENT column is BIGINT you don't have to worry.

Assuming the code inserts one million records each second (this is highly overrated, to not say impossible), there are enough values for the id column for the next half of million years. Or just 292,277 years if the column is not UNSIGNED.


I witnessed the behaviour on a live web server that was using INT(11) (and not UNSIGNED) as the AUTO_INCREMENTed PK for a table that records information about the visits of the web site. It failed in the middle of the night, after several years of running smoothly, when the visits number reached 2^31 (2 billions and something).

Changing the column type from INT to BIGINT is not a solution on a 2-billion records table (it takes ages to complete and when the system is live, there is never enough time). The solution was to create a new table with the same structure but with BIGINT for the PK column and an initial value for the AUTO_INCREMENT column and then switch the tables:

CREATE TABLE `tbl_new` (
`id` BIGINT(20) UNSIGNED NOT NULL AUTO_INCREMENT,
PRIMARY KEY (`id`)
) AUTO_INCREMENT=2200000000;

RENAME TABLE `tbl` TO `tbl_old`, `tbl_new` TO `tbl`;

SQL performance MAX()

There will be no difference as you can test yourself by inspecting the execution plans. If id is the clustered index, you should see an ordered clustered index scan; if it is not indexed, you'll still see either a table scan or a clustered index scan, but it won't be ordered in either case.

The TOP 1 approach can be useful if you want to pull along other values from the row, which is easier than pulling the max in a subquery and then joining. If you want other values from the row, you need to dictate how to deal with ties in both cases.

Having said that, there are some scenarios where the plan can be different, so it is important to test depending on whether the column is indexed and whether or not it is monotonically increasing. I created a simple table and inserted 50000 rows:

CREATE TABLE dbo.x
(
a INT, b INT, c INT, d INT,
e DATETIME, f DATETIME, g DATETIME, h DATETIME
);
CREATE UNIQUE CLUSTERED INDEX a ON dbo.x(a);
CREATE INDEX b ON dbo.x(b)
CREATE INDEX e ON dbo.x(e);
CREATE INDEX f ON dbo.x(f);

INSERT dbo.x(a, b, c, d, e, f, g, h)
SELECT
n.rn, -- ints monotonically increasing
n.a, -- ints in random order
n.rn,
n.a,
DATEADD(DAY, n.rn/100, '20100101'), -- dates monotonically increasing
DATEADD(DAY, -n.a % 1000, '20120101'), -- dates in random order
DATEADD(DAY, n.rn/100, '20100101'),
DATEADD(DAY, -n.a % 1000, '20120101')
FROM
(
SELECT TOP (50000)
(ABS(s1.[object_id]) % 10000) + 1,
rn = ROW_NUMBER() OVER (ORDER BY s2.[object_id])
FROM sys.all_objects AS s1
CROSS JOIN sys.all_objects AS s2
) AS n(a,rn);
GO

On my system this created values in a/c from 1 to 50000, b/d between 3 and 9994, e/g from 2010-01-01 through 2011-05-16, and f/h from 2009-04-28 through 2012-01-01.

First, let's compare the indexed monotonically increasing integer columns, a and c. a has a clustered index, c does not:

SELECT MAX(a) FROM dbo.x;
SELECT TOP (1) a FROM dbo.x ORDER BY a DESC;

SELECT MAX(c) FROM dbo.x;
SELECT TOP (1) c FROM dbo.x ORDER BY c DESC;

Results:

Sample Image

The big problem with the 4th query is that, unlike MAX, it requires a sort. Here is 3 compared to 4:

Sample Image

Sample Image

This will be a common problem across all of these query variations: a MAX against an unindexed column will be able to piggy-back on the clustered index scan and perform a stream aggregate, while TOP 1 needs to perform a sort which is going to be more expensive.

I did test and saw the exact same results across testing b+d, e+g, and f+h.

So it seems to me that, in addition to producing more standards-compliance code, there is a potential performance benefit to using MAX in favor of TOP 1 depending on the underlying table and indexes (which can change after you've put your code in production). So I would say that, without further information, MAX is preferable.

(And as I said before, TOP 1 might really be the behavior you're after, if you're pulling additional columns. You'll want to test MAX + JOIN methods as well if that's what you're after.)

MYSQL SELECT MAX(id) + 1 where id 1000

I think maybe you want this:
http://dev.mysql.com/doc/refman/5.0/en/mathematical-functions.html#function_mod

Giving something like:

INSERT into tablename(id) SELECT MOD(count(id),1000)+1 FROM tablename

OR similarly,

INSERT into tablename(id) SELECT (count(id)%1000)+1 FROM tablename

Cheers.

How can I select the row with the highest ID in MySQL?

SELECT * FROM permlog ORDER BY id DESC LIMIT 0, 1


Related Topics



Leave a reply



Submit