For autoincrement fields: MAX(ID) vs TOP 1 ID ORDER BY ID DESC
In theory, they will use same plans and run almost same time.
In practice,
SELECT TOP 1 Id FROM Table1 ORDER BY Id DESC
will more probably use a PRIMARY KEY INDEX
.
Also, this one is more extendable if you will decide to select some else column along with id
.
An actual plan on MAX()
says:
SELECT <- AGGREGATE <- TOP <- CLUSTERED INDEX SCAN
, while plan for TOP 1
says:
SELECT <- TOP <- CLUSTERED INDEX SCAN
, i. e. aggregate
is omitted.
Aggregate actually won't do anything here, as there is but one row.
P. S. As @Mehrdad Afshari
and @John Sansom
noted, on a non-indexed field MAX
is slightly faster (of course not 20
times as optimizer says):
-- 18,874,368 rows
SET LANGUAGE ENGLISH
SET STATISTICS TIME ON
SET STATISTICS IO ON
PRINT 'MAX'
SELECT MAX(id) FROM master
PRINT 'TOP 1'
SELECT TOP 1 id FROM master ORDER BY id DESC
PRINT 'MAX'
SELECT MAX(id) FROM master
PRINT 'TOP 1'
SELECT TOP 1 id FROM master ORDER BY id DESC
PRINT 'MAX'
SELECT MAX(id) FROM master
PRINT 'TOP 1'
SELECT TOP 1 id FROM master ORDER BY id DESC
PRINT 'MAX'
SELECT MAX(id) FROM master
PRINT 'TOP 1'
SELECT TOP 1 id FROM master ORDER BY id DESC
PRINT 'MAX'
SELECT MAX(id) FROM master
PRINT 'TOP 1'
SELECT TOP 1 id FROM master ORDER BY id DESC
Changed language setting to us_english.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 1 ms.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 1 ms.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 1 ms.
MAX
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 20 ms.
(строк обработано: 1)
Table 'master'. Scan count 3, logical reads 32655, physical reads 0, read-ahead reads 447, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 5452 ms, elapsed time = 2766 ms.
TOP 1
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 1 ms.
(строк обработано: 1)
Table 'master'. Scan count 3, logical reads 32655, physical reads 0, read-ahead reads 2, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 6813 ms, elapsed time = 3449 ms.
MAX
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 1 ms.
(строк обработано: 1)
Table 'master'. Scan count 3, logical reads 32655, physical reads 0, read-ahead reads 44, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 5359 ms, elapsed time = 2714 ms.
TOP 1
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 1 ms.
(строк обработано: 1)
Table 'master'. Scan count 3, logical reads 32655, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 6766 ms, elapsed time = 3379 ms.
MAX
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 1 ms.
(строк обработано: 1)
Table 'master'. Scan count 3, logical reads 32655, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 5406 ms, elapsed time = 2726 ms.
TOP 1
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 1 ms.
(строк обработано: 1)
Table 'master'. Scan count 3, logical reads 32655, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 6780 ms, elapsed time = 3415 ms.
MAX
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 1 ms.
(строк обработано: 1)
Table 'master'. Scan count 3, logical reads 32655, physical reads 0, read-ahead reads 85, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 5392 ms, elapsed time = 2709 ms.
TOP 1
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 1 ms.
(строк обработано: 1)
Table 'master'. Scan count 3, logical reads 32655, physical reads 0, read-ahead reads 10, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 6766 ms, elapsed time = 3387 ms.
MAX
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 1 ms.
(строк обработано: 1)
Table 'master'. Scan count 3, logical reads 32655, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 5374 ms, elapsed time = 2708 ms.
TOP 1
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 1 ms.
(строк обработано: 1)
Table 'master'. Scan count 3, logical reads 32655, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 6797 ms, elapsed time = 3494 ms.
Can I assume the highest auto increment value is always the newest entry?
Will
SELECT MAX(ID) FROM ...
always work?
No, it will not. If you have more than one copy of your program connected to your database, and one program does what you suggest, and another program then performs an insert, the first program's MAX(ID)
value will become stale. It's a race condition. Those are very difficult to detect and debug, because they don't crop up until a system is under heavy load.
If all you care about is a snapshot of MAX(ID)
you'll be OK. But you cannot use this MAX(ID)
value as the value for the next insertion, unless you never will have more than one program instance inserting data. It's unwise to rely on that kind of never.
If not, is there a more reliable way to get the newest entry?
Immediately after you do an INSERT operation you can do SELECT LAST_INSERT_ID()
to retrieve the autoincrementing ID just used. Various APIs, like mysqli
, PDO
, and JDBC
return this ID to a calling program when an INSERT operation completes.
LAST_INSERT_ID()
and the related API functions work on a connection-by-connection basis. Therefore, if multiple connections are inserting rows, each connection will get its own id values.
Count(*) Vs. Max(Id)
If you insert 10 rows, delete 5, then insert 10 more then your COUNT(*)
will not match MAX(id)
.
You can also insert an id
way ahead of where it should be, like in an empty table INSERT ... (id) VALUES (9000000)
will kick up your MAX(id)
significantly despite having only 1 row.
Rolled-back transactions can also interfere with this.
If you want to know the next increment, check the AUTO_INCREMENT
value, but be aware that this is only a guess, the actual value used may differ by the time you actually get around to inserting.
If you want them to match then you need to:
- Start with a table where
AUTO_INCREMENT=1
, as in it's either brand new or has been cleared withTRUNCATE
. - Insert using auto-generated
id
values as one transaction, or as a series of transactions where all of them have been fully committed.
MySQL SELECT MAX vs ORDER BY LIMIT 1 on WHERE clause
Compare in the analyze:
-> Index scan on table using PRIMARY (reverse) (cost=0.10 rows=2) (actual time=0.036..0.036 rows=1 loops=1)
Versus:
-> Index range scan on table using open_time (cost=217786.76 rows=1081033) (actual time=0.031..705.926 rows=2180645 loops=1)
Examining 2 rows must be a lot quicker than examining 2,180,645 rows.
In the query with ORDER BY id DESC LIMIT 1
, it uses the primary key index. It starts at the end because it's a reverse order. Then it just iterates down the leaf nodes of the index (in descending order), until it examines the first row that also matches open_time > 0
. Then the LIMIT optimization allows the query execution to finish. Based on its statistics, it estimates this will happen after examining 2 rows.
In the query with MAX(id)
, it uses the index on open_time
. But because it's a range condition open_time > 0
, it can't assume the maximum id is found at the start or end of that range. So it must examine every matching entry in the open_time
index, searching for the greatest value of id
(primary keys are implicitly part of a secondary index). There's no chance of early-termination as there is in the query with LIMIT
.
What to do if the auto-increment value reaches its limit?
Let's assume a table structure like:
CREATE TABLE `tbl` (
`id` INT(11) UNSIGNED NOT NULL AUTO_INCREMENT,
PRIMARY KEY (`id`)
);
and INSERT
queries like:
INSERT INTO tbl(id) VALUES (NULL);
In the real code there are also other columns in the table and they are also present in the INSERT
query but we can safely ignore them because they don't bring any value to this specific issue.
When the value of column id
reaches its maximum value no more rows can be inserted in the table using the query above. The next INSERT
fails with the error:
SQL Error (167): Out of range value for column 'id'.
If there are gaps in the values of the id
column then you can still insert rows that use values not present in the table but you have to specify the values for id
in the INSERT
query.
Anyway, if the type of your AUTO_INCREMENT
column is BIGINT
you don't have to worry.
Assuming the code inserts one million records each second (this is highly overrated, to not say impossible), there are enough values for the id
column for the next half of million years. Or just 292,277
years if the column is not UNSIGNED
.
I witnessed the behaviour on a live web server that was using INT(11)
(and not UNSIGNED
) as the AUTO_INCREMENT
ed PK for a table that records information about the visits of the web site. It failed in the middle of the night, after several years of running smoothly, when the visits number reached 2^31
(2
billions and something).
Changing the column type from INT
to BIGINT
is not a solution on a 2-billion records table (it takes ages to complete and when the system is live, there is never enough time). The solution was to create a new table with the same structure but with BIGINT
for the PK column and an initial value for the AUTO_INCREMENT
column and then switch the tables:
CREATE TABLE `tbl_new` (
`id` BIGINT(20) UNSIGNED NOT NULL AUTO_INCREMENT,
PRIMARY KEY (`id`)
) AUTO_INCREMENT=2200000000;
RENAME TABLE `tbl` TO `tbl_old`, `tbl_new` TO `tbl`;
SQL performance MAX()
There will be no difference as you can test yourself by inspecting the execution plans. If id
is the clustered index, you should see an ordered clustered index scan; if it is not indexed, you'll still see either a table scan or a clustered index scan, but it won't be ordered in either case.
The TOP 1
approach can be useful if you want to pull along other values from the row, which is easier than pulling the max in a subquery and then joining. If you want other values from the row, you need to dictate how to deal with ties in both cases.
Having said that, there are some scenarios where the plan can be different, so it is important to test depending on whether the column is indexed and whether or not it is monotonically increasing. I created a simple table and inserted 50000 rows:
CREATE TABLE dbo.x
(
a INT, b INT, c INT, d INT,
e DATETIME, f DATETIME, g DATETIME, h DATETIME
);
CREATE UNIQUE CLUSTERED INDEX a ON dbo.x(a);
CREATE INDEX b ON dbo.x(b)
CREATE INDEX e ON dbo.x(e);
CREATE INDEX f ON dbo.x(f);
INSERT dbo.x(a, b, c, d, e, f, g, h)
SELECT
n.rn, -- ints monotonically increasing
n.a, -- ints in random order
n.rn,
n.a,
DATEADD(DAY, n.rn/100, '20100101'), -- dates monotonically increasing
DATEADD(DAY, -n.a % 1000, '20120101'), -- dates in random order
DATEADD(DAY, n.rn/100, '20100101'),
DATEADD(DAY, -n.a % 1000, '20120101')
FROM
(
SELECT TOP (50000)
(ABS(s1.[object_id]) % 10000) + 1,
rn = ROW_NUMBER() OVER (ORDER BY s2.[object_id])
FROM sys.all_objects AS s1
CROSS JOIN sys.all_objects AS s2
) AS n(a,rn);
GO
On my system this created values in a/c from 1 to 50000, b/d between 3 and 9994, e/g from 2010-01-01 through 2011-05-16, and f/h from 2009-04-28 through 2012-01-01.
First, let's compare the indexed monotonically increasing integer columns, a and c. a has a clustered index, c does not:
SELECT MAX(a) FROM dbo.x;
SELECT TOP (1) a FROM dbo.x ORDER BY a DESC;
SELECT MAX(c) FROM dbo.x;
SELECT TOP (1) c FROM dbo.x ORDER BY c DESC;
Results:
The big problem with the 4th query is that, unlike MAX
, it requires a sort. Here is 3 compared to 4:
This will be a common problem across all of these query variations: a MAX
against an unindexed column will be able to piggy-back on the clustered index scan and perform a stream aggregate, while TOP 1
needs to perform a sort which is going to be more expensive.
I did test and saw the exact same results across testing b+d, e+g, and f+h.
So it seems to me that, in addition to producing more standards-compliance code, there is a potential performance benefit to using MAX
in favor of TOP 1
depending on the underlying table and indexes (which can change after you've put your code in production). So I would say that, without further information, MAX
is preferable.
(And as I said before, TOP 1
might really be the behavior you're after, if you're pulling additional columns. You'll want to test MAX
+ JOIN
methods as well if that's what you're after.)
MYSQL SELECT MAX(id) + 1 where id 1000
I think maybe you want this:
http://dev.mysql.com/doc/refman/5.0/en/mathematical-functions.html#function_mod
Giving something like:
INSERT into tablename(id) SELECT MOD(count(id),1000)+1 FROM tablename
OR similarly,
INSERT into tablename(id) SELECT (count(id)%1000)+1 FROM tablename
Cheers.
How can I select the row with the highest ID in MySQL?
SELECT * FROM permlog ORDER BY id DESC LIMIT 0, 1
Related Topics
How to List Field's Name in Table in Access Using SQL
SQL Server 2005 Recursive Query with Loops in Data - Is It Possible
Combine Consecutive Date Ranges
How to Get Windows Log-In User Name for a SQL Log in User
Using Dynamic SQL to Specify a Column Name by Adding a Variable to Simple SQL Query
Access Columns of a Table by Index Instead of Name in SQL Server Stored Procedure
Does Inner Join Performance Depends on Order of Tables
How to Remove Duplicates from Space Separated List by Oracle Regexp_Replace
Recursive Subquerying with Sorting
Database/Sql: How to Store Longitude/Latitude Data
Possible to Do a Delete with a Having Clause
In Postgres, Can You Set the Default Formatting for a Timestamp, by Session or Globally
Transact-SQL Shorthand Join Syntax
Unique Constraint on Combination of Two Columns
How to Convert Hh:Mm:Ss to Seconds in SQL Server with More Than 24 Hours
How to Fix Ora-01427 Single-Row Subquery Returns More Than One Row in Select