Retrieving the last record in each group - MySQL
MySQL 8.0 now supports windowing functions, like almost all popular SQL implementations. With this standard syntax, we can write greatest-n-per-group queries:
WITH ranked_messages AS (
SELECT m.*, ROW_NUMBER() OVER (PARTITION BY name ORDER BY id DESC) AS rn
FROM messages AS m
)
SELECT * FROM ranked_messages WHERE rn = 1;
This and other approaches to finding groupwise maximal rows are illustrated in the MySQL manual.
Below is the original answer I wrote for this question in 2009:
I write the solution this way:
SELECT m1.*
FROM messages m1 LEFT JOIN messages m2
ON (m1.name = m2.name AND m1.id < m2.id)
WHERE m2.id IS NULL;
Regarding performance, one solution or the other can be better, depending on the nature of your data. So you should test both queries and use the one that is better at performance given your database.
For example, I have a copy of the StackOverflow August data dump. I'll use that for benchmarking. There are 1,114,357 rows in the Posts
table. This is running on MySQL 5.0.75 on my Macbook Pro 2.40GHz.
I'll write a query to find the most recent post for a given user ID (mine).
First using the technique shown by @Eric with the GROUP BY
in a subquery:
SELECT p1.postid
FROM Posts p1
INNER JOIN (SELECT pi.owneruserid, MAX(pi.postid) AS maxpostid
FROM Posts pi GROUP BY pi.owneruserid) p2
ON (p1.postid = p2.maxpostid)
WHERE p1.owneruserid = 20860;
1 row in set (1 min 17.89 sec)
Even the EXPLAIN
analysis takes over 16 seconds:
+----+-------------+------------+--------+----------------------------+-------------+---------+--------------+---------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+--------+----------------------------+-------------+---------+--------------+---------+-------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 76756 | |
| 1 | PRIMARY | p1 | eq_ref | PRIMARY,PostId,OwnerUserId | PRIMARY | 8 | p2.maxpostid | 1 | Using where |
| 2 | DERIVED | pi | index | NULL | OwnerUserId | 8 | NULL | 1151268 | Using index |
+----+-------------+------------+--------+----------------------------+-------------+---------+--------------+---------+-------------+
3 rows in set (16.09 sec)
Now produce the same query result using my technique with LEFT JOIN
:
SELECT p1.postid
FROM Posts p1 LEFT JOIN posts p2
ON (p1.owneruserid = p2.owneruserid AND p1.postid < p2.postid)
WHERE p2.postid IS NULL AND p1.owneruserid = 20860;
1 row in set (0.28 sec)
The EXPLAIN
analysis shows that both tables are able to use their indexes:
+----+-------------+-------+------+----------------------------+-------------+---------+-------+------+--------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+----------------------------+-------------+---------+-------+------+--------------------------------------+
| 1 | SIMPLE | p1 | ref | OwnerUserId | OwnerUserId | 8 | const | 1384 | Using index |
| 1 | SIMPLE | p2 | ref | PRIMARY,PostId,OwnerUserId | OwnerUserId | 8 | const | 1384 | Using where; Using index; Not exists |
+----+-------------+-------+------+----------------------------+-------------+---------+-------+------+--------------------------------------+
2 rows in set (0.00 sec)
Here's the DDL for my Posts
table:
CREATE TABLE `posts` (
`PostId` bigint(20) unsigned NOT NULL auto_increment,
`PostTypeId` bigint(20) unsigned NOT NULL,
`AcceptedAnswerId` bigint(20) unsigned default NULL,
`ParentId` bigint(20) unsigned default NULL,
`CreationDate` datetime NOT NULL,
`Score` int(11) NOT NULL default '0',
`ViewCount` int(11) NOT NULL default '0',
`Body` text NOT NULL,
`OwnerUserId` bigint(20) unsigned NOT NULL,
`OwnerDisplayName` varchar(40) default NULL,
`LastEditorUserId` bigint(20) unsigned default NULL,
`LastEditDate` datetime default NULL,
`LastActivityDate` datetime default NULL,
`Title` varchar(250) NOT NULL default '',
`Tags` varchar(150) NOT NULL default '',
`AnswerCount` int(11) NOT NULL default '0',
`CommentCount` int(11) NOT NULL default '0',
`FavoriteCount` int(11) NOT NULL default '0',
`ClosedDate` datetime default NULL,
PRIMARY KEY (`PostId`),
UNIQUE KEY `PostId` (`PostId`),
KEY `PostTypeId` (`PostTypeId`),
KEY `AcceptedAnswerId` (`AcceptedAnswerId`),
KEY `OwnerUserId` (`OwnerUserId`),
KEY `LastEditorUserId` (`LastEditorUserId`),
KEY `ParentId` (`ParentId`),
CONSTRAINT `posts_ibfk_1` FOREIGN KEY (`PostTypeId`) REFERENCES `posttypes` (`PostTypeId`)
) ENGINE=InnoDB;
Note to commenters: If you want another benchmark with a different version of MySQL, a different dataset, or different table design, feel free to do it yourself. I have shown the technique above. Stack Overflow is here to show you how to do software development work, not to do all the work for you.
How to get the latest record in each group using GROUP BY?
You should find out last timestamp
values in each group (subquery), and then join this subquery to the table -
SELECT t1.* FROM messages t1
JOIN (SELECT from_id, MAX(timestamp) timestamp FROM messages GROUP BY from_id) t2
ON t1.from_id = t2.from_id AND t1.timestamp = t2.timestamp;
Select last records from table using group by
Assuming that the start and end dates will always be the highest values then you need to drop some of the columns from the GROUP BY
(having all the columns in the GROUP BY
is kinda like using DISTINCT
) and use an aggregate function on the other column:
SELECT UserId,
MAX(StartDate) AS StartDate,
MAX(EndDate) AS EndDate
FROM usersworktime
GROUP BY UserId;
Otherwise, if that isn't the case, you can use a CTE and ROW_NUMBER
:
WITH CTE AS(
SELECT UserID,
StartDate,
EndDate,
ROW_NUMBER() OVER (PARTITION BY UserID ORDER BY UsersWordTimeID DESC) AS RN
FROM usersworktime)
SELECT UserID,
StartDate,
EndDate
FROM CTE
WHERE RN = 1;
SELECT latest record group by one column
You can use variables for this:
SELECT location, parameter, datetime, value
FROM (
SELECT location, parameter, datetime, value,
@seq := IF(@loc = location, @seq + 1,
IF(@loc := location, 1, 1)) AS seq
FROM mytable
CROSS JOIN (SELECT @seq := 0, @loc = '') AS vars
ORDER By location, datetime desc, value desc) AS t
WHERE t.seq = 1
The inner query has an ORDER BY
clause that returns the required latest-per-group record first within its own slice. The variable @seq
is set to 1 for this first record using the logic implemented by the IF
functions. The outer query simply filters the derived table to get the expected record for each location
slice.
Demo here
How to get the last row in the table using group by with Order by DESC?
One approach uses a GROUP BY
query:
SELECT tla1.*, tb.*
FROM tbl_brands tb
INNER JOIN tbl_loader_attachment tla1
ON tb.b_id = tla1.b_id
INNER JOIN
(
SELECT b_id, MAX(la_id) AS max_la_id
FROM tbl_loader_attachment
GROUP BY b_id
) tla2
ON tla1.b_id = tla2.b_id AND
tla1.la_id = tla2.max_la_id;
If you are using MySQL 8+ (or should a future reader of this question be using MySQL 8+), then another option here is to use ROW_NUMBER
:
WITH cte AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY b_id ORDER BY la_id DESC) rn
FROM tbl_loader_attachment
)
SELECT tla.*, tb.*
FROM tbl_brands tb
INNER JOIN cte tla ON tb.b_id = tla.b_id
WHERE tla.rn = 1;
Retrieve last record in a group based on string - DB2
You can use ROW_NUMBER()
. For example, if your table is called t
you can do:
select *
from (
select *,
row_number() over(partition by location, product
order by date desc) as rn
from t
) x
where rn = 1
Get the latest records per Group By SQL
The rank
window clause allows you to, well, rank rows according to some partitioning, and then you could just select the top ones:
SELECT oDate, oName, oItem, oQty, oRemarks
FROM (SELECT oDate, oName, oItem, oQty, oRemarks,
RANK() OVER (PARTITION BY oName ORDER BY oDate DESC) AS rk
FROM my_table) t
WHERE rk = 1
Returning the 'last' row of each 'group by' in MySQL
Try this query -
SELECT t1.* FROM foo t1
JOIN (SELECT uid, MAX(id) id FROM foo GROUP BY uid) t2
ON t1.id = t2.id AND t1.uid = t2.uid;
Then use EXPLAIN to analyze queries.
SELECT t1.* FROM foo t1
LEFT JOIN foo t2
ON t1.id < t2.id AND t1.uid = t2.uid
WHERE t2.id is NULL;
MySql last record from group by item_id with order by date
You can filter with a correlated subquery:
select t.*
from `ledgers` t
where
date(t.`date`) >= ?
and date(t.`date`) <= ?
and t.`date` = (
select max(t1.`date`)
from `ledgers` t1
where t1.`item_id` = t.`item_id`
)
For performance, consider an index on (item_id, date)
.
Another option is to use rank()
(available in MySQ 8.0 only):
select *
from (
select
t.*,
rank() over(partition by `item_id` order by `date` desc) rn
from `ledgers` t
where date(t.`date`) >= ? and date(t.`date`) <= ?
) t
where rn = 1
Related Topics
How to Pivot Rows to Columns in MySQL Without Using Case
How to "Reset" Running Sum After It Reaches a Threshold
Timezone Date Format in Oracle
Find Min and Max for Subsets of Consecutive Rows - Gaps and Islands
How to Select First N Rows from a Table in T-Sql
SQL Server Audit Logout Creates Huge Number of Reads
SQL How to Have a "Conditionally Unique" Constraint on a Table
How to Get the Date and Time from Timestamp in Postgresql Select Query
Is It Better to Do an Equi Join in the from Clause or Where Clause
How to Find All Open/Active Connections in Db2 (8.X)
How to Find 11Th Entry in SQL Access Database Table
Quartile/Percentile in Ms Access via SQL with a Group by When Some Values Can Be Null
Pivot Table with Non-Cardinal Values
How to Write the Equivalent SQL Case Statement for Query Given Below
How to Add Sequenced Number Based on Sorted Value in Query in Access