SQL Performance Max()

SQL performance MAX()

There will be no difference as you can test yourself by inspecting the execution plans. If id is the clustered index, you should see an ordered clustered index scan; if it is not indexed, you'll still see either a table scan or a clustered index scan, but it won't be ordered in either case.

The TOP 1 approach can be useful if you want to pull along other values from the row, which is easier than pulling the max in a subquery and then joining. If you want other values from the row, you need to dictate how to deal with ties in both cases.

Having said that, there are some scenarios where the plan can be different, so it is important to test depending on whether the column is indexed and whether or not it is monotonically increasing. I created a simple table and inserted 50000 rows:

CREATE TABLE dbo.x
(
a INT, b INT, c INT, d INT,
e DATETIME, f DATETIME, g DATETIME, h DATETIME
);
CREATE UNIQUE CLUSTERED INDEX a ON dbo.x(a);
CREATE INDEX b ON dbo.x(b)
CREATE INDEX e ON dbo.x(e);
CREATE INDEX f ON dbo.x(f);

INSERT dbo.x(a, b, c, d, e, f, g, h)
SELECT
n.rn, -- ints monotonically increasing
n.a, -- ints in random order
n.rn,
n.a,
DATEADD(DAY, n.rn/100, '20100101'), -- dates monotonically increasing
DATEADD(DAY, -n.a % 1000, '20120101'), -- dates in random order
DATEADD(DAY, n.rn/100, '20100101'),
DATEADD(DAY, -n.a % 1000, '20120101')
FROM
(
SELECT TOP (50000)
(ABS(s1.[object_id]) % 10000) + 1,
rn = ROW_NUMBER() OVER (ORDER BY s2.[object_id])
FROM sys.all_objects AS s1
CROSS JOIN sys.all_objects AS s2
) AS n(a,rn);
GO

On my system this created values in a/c from 1 to 50000, b/d between 3 and 9994, e/g from 2010-01-01 through 2011-05-16, and f/h from 2009-04-28 through 2012-01-01.

First, let's compare the indexed monotonically increasing integer columns, a and c. a has a clustered index, c does not:

SELECT MAX(a) FROM dbo.x;
SELECT TOP (1) a FROM dbo.x ORDER BY a DESC;

SELECT MAX(c) FROM dbo.x;
SELECT TOP (1) c FROM dbo.x ORDER BY c DESC;

Results:

Sample Image

The big problem with the 4th query is that, unlike MAX, it requires a sort. Here is 3 compared to 4:

Sample Image

Sample Image

This will be a common problem across all of these query variations: a MAX against an unindexed column will be able to piggy-back on the clustered index scan and perform a stream aggregate, while TOP 1 needs to perform a sort which is going to be more expensive.

I did test and saw the exact same results across testing b+d, e+g, and f+h.

So it seems to me that, in addition to producing more standards-compliance code, there is a potential performance benefit to using MAX in favor of TOP 1 depending on the underlying table and indexes (which can change after you've put your code in production). So I would say that, without further information, MAX is preferable.

(And as I said before, TOP 1 might really be the behavior you're after, if you're pulling additional columns. You'll want to test MAX + JOIN methods as well if that's what you're after.)

Performance of ALL VS MAX() in SQL

Query 1

 Scan count 2, logical reads 2
---------------
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 13 ms.
---------------
Query Cost is 49%

sql optimizer use Inner Join for get result between subquery and outer query

Query 2

 Scan count 2, logical reads 5
---------------
SQL Server Execution Times:
CPU time = 15 ms, elapsed time = 159 ms.
---------------
Query Cost is 51%

sql optimizer use Left Join for get result between subquery and outer query

subquery cost in both of Queries are equal.
I think Query1 is better than Query2

Performance for Avg & Max in SQL

For this query:

SELECT Col1, Col2,
COALESCE(AVG(Col3), 0) AS AvgCol,
COALESCE(MAX(Col3), 0) AS MaxCol,
COUNT(*) AS Col5
FROM TableName
GROUP BY Col1, Col2
ORDER BY Col1, MaxCol DESC;

I would start with an index on (Col1, Col2, Col3).

I'm not sure if this will help. It is possible that the issue is the time for ordering the results.

SELECT MAX() too slow - any alternatives?

[UNSOLVED] But I've moved on!

Thanks to everyone who provided answers / suggestions. Unfortunately I couldn't get any further with this, so have given-up trying for now.

It looks like the best solution is to re-write the application to UPDATE the latest data into into a different table, that way it's a really quick and simple SELECT to latest readings.

Thanks again for the suggestions.

Performance of max() vs ORDER BY DESC + LIMIT 1

There does not seem to be an index on sensor.station_id, which is important here.

There is an actual difference between max() and ORDER BY DESC + LIMIT 1. Many people seem to miss that. NULL values sort first in descending sort order. So ORDER BY timestamp DESC LIMIT 1 returns a NULL value if one exists, while the aggregate function max() ignores NULL values and returns the latest not-null timestamp. ORDER BY timestamp DESC NULLS LAST LIMIT 1 would be equivalent

For your case, since your column d.timestamp is defined NOT NULL (as your update revealed), there is no effective difference. An index with DESC NULLS LAST and the same clause in the ORDER BY for the LIMIT query should still serve you best. I suggest these indexes (my query below builds on the 2nd one):

sensor(station_id, id)
data(sensor_id, timestamp DESC NULLS LAST)

You can drop the other indexes sensor_ind_timestamp and sensor_ind_timestamp_desc unless they are in use otherwise (unlikely, but possible).

Much more importantly, there is another difficulty: The filter on the first table sensors returns few, but still (possibly) multiple rows. Postgres expects to find 2 rows (rows=2) in your added EXPLAIN output.

The perfect technique would be an index-skip-scan (a.k.a. loose index scan) for the second table data - which is not currently implemented (up to at least Postgres 15). There are various workarounds. See:

  • Optimize GROUP BY query to retrieve latest row per user

The best should be:

SELECT d.timestamp
FROM sensors s
CROSS JOIN LATERAL (
SELECT timestamp
FROM data
WHERE sensor_id = s.id
ORDER BY timestamp DESC NULLS LAST
LIMIT 1
) d
WHERE s.station_id = 4
ORDER BY d.timestamp DESC NULLS LAST
LIMIT 1;

The choice between max() and ORDER BY / LIMIT hardly matters in comparison. You might as well:

SELECT max(d.timestamp) AS timestamp
FROM sensors s
CROSS JOIN LATERAL (
SELECT timestamp
FROM data
WHERE sensor_id = s.id
ORDER BY timestamp DESC NULLS LAST
LIMIT 1
) d
WHERE s.station_id = 4;

Or:

SELECT max(d.timestamp) AS timestamp
FROM sensors s
CROSS JOIN LATERAL (
SELECT max(timestamp) AS timestamp
FROM data
WHERE sensor_id = s.id
) d
WHERE s.station_id = 4;

Or even with a correlated subquery, shortest of all:

SELECT max((SELECT max(timestamp) FROM data WHERE sensor_id = s.id)) AS timestamp
FROM sensors s
WHERE station_id = 4;

Note the double parentheses!

The additional advantage of LIMIT in a LATERAL join is that you can retrieve arbitrary columns of the selected row, not just the latest timestamp (one column).

Related:

  • Why do NULL values come first when ordering DESC in a PostgreSQL query?
  • What is the difference between a LATERAL JOIN and a subquery in PostgreSQL?
  • Select first row in each GROUP BY group?
  • Optimize groupwise maximum query

MAX vs Top 1 - which is better?

Performance is generally similar, if your table is indexed.

Worth considering though: Top usually only makes sense if you're ordering your results (otherwise, top of what?)

Ordering a result requires more processing.

Min doesn't always require ordering. (Just depends, but often you don't need order by or group by, etc.)

In your two examples, I'd expect speed / x-plan to be very similar. You can always turn to your stats to make sure, but I doubt the difference would be significant.



Related Topics



Leave a reply



Submit