Get top n records for each group of grouped results
Here is one way to do this, using UNION ALL
(See SQL Fiddle with Demo). This works with two groups, if you have more than two groups, then you would need to specify the group
number and add queries for each group
:
(
select *
from mytable
where `group` = 1
order by age desc
LIMIT 2
)
UNION ALL
(
select *
from mytable
where `group` = 2
order by age desc
LIMIT 2
)
There are a variety of ways to do this, see this article to determine the best route for your situation:
http://www.xaprb.com/blog/2006/12/07/how-to-select-the-firstleastmax-row-per-group-in-sql/
Edit:
This might work for you too, it generates a row number for each record. Using an example from the link above this will return only those records with a row number of less than or equal to 2:
select person, `group`, age
from
(
select person, `group`, age,
(@num:=if(@group = `group`, @num +1, if(@group := `group`, 1, 1))) row_number
from test t
CROSS JOIN (select @num:=0, @group:=null) c
order by `Group`, Age desc, person
) as x
where x.row_number <= 2;
See Demo
Get top 1 row of each group
;WITH cte AS
(
SELECT *,
ROW_NUMBER() OVER (PARTITION BY DocumentID ORDER BY DateCreated DESC) AS rn
FROM DocumentStatusLogs
)
SELECT *
FROM cte
WHERE rn = 1
If you expect 2 entries per day, then this will arbitrarily pick one. To get both entries for a day, use DENSE_RANK instead
As for normalised or not, it depends if you want to:
- maintain status in 2 places
- preserve status history
- ...
As it stands, you preserve status history. If you want latest status in the parent table too (which is denormalisation) you'd need a trigger to maintain "status" in the parent. or drop this status history table.
Get top N rows of each group in MySQL
If you want n
rows per group, use row_number()
. If you then want them interleaved, use order by
:
select t.*
from (select t.*,
row_number() over (partition by type order by name) as seqnum
from t
) t
where seqnum <= 2
order by seqnum, type;
This assumes that "top" is alphabetically by name
. If you have another definition, use that for the order by
for row_number()
.
Get top n records with grouped table
if you are using mysql 8.0 or higher you can use row_number() over partition by..
select t1.created_at, t1.user_id from (
select row_number() over (partition by user_id order by created_at desc) rn, created_at, user_id
from orders) t1 where t1.rn <=2
using mysql versions 5.7 and below
SELECT t1.created_at, t1.user_id
FROM (SELECT
@row_number:=CASE
WHEN @varId = user_id
THEN
@row_number + 1
ELSE
1
END AS rn,
@varId:=iuser_id user_id,
created_at
FROM
orders,
(SELECT @varId:=0,@row_number:=0) as t
ORDER BY
user_id asc, created_at desc) t1
WHERE t1.rn <= 2
Get records with max value for each group of grouped SQL results
There's a super-simple way to do this in mysql:
select *
from (select * from mytable order by `Group`, age desc, Person) x
group by `Group`
This works because in mysql you're allowed to not aggregate non-group-by columns, in which case mysql just returns the first row. The solution is to first order the data such that for each group the row you want is first, then group by the columns you want the value for.
You avoid complicated subqueries that try to find the max()
etc, and also the problems of returning multiple rows when there are more than one with the same maximum value (as the other answers would do)
Note: This is a mysql-only solution. All other databases I know will throw an SQL syntax error with the message "non aggregated columns are not listed in the group by clause" or similar. Because this solution uses undocumented behavior, the more cautious may want to include a test to assert that it remains working should a future version of MySQL change this behavior.
Version 5.7 update:
Since version 5.7, the sql-mode
setting includes ONLY_FULL_GROUP_BY
by default, so to make this work you must not have this option (edit the option file for the server to remove this setting).
Optimized way to get top n records of each group
Your approach is fine, but your query is not. In particular, MySQL does not guarantee the order of evaluation of expressions in a SELECT
, so you should not assign a variable in one expression and use it in another.
Fortunately, you can combine the assignments into a single expression:
SELECT b.*
FROM (SELECT b.sub_cat_id, b.title, created_date
(@rn := IF(@sc = b.sub_cat_id, @rn + 1,
if(@sc := b.sub_cat_id, 1, 1)
)
) as rn
FROM blog b CROSS JOIN
(SELECT @sc := -1, @rn := 0) params
WHERE b.type = 'BLOG' AND
b.sub_cat_id IN (1, 2, 8) AND
b.created_date <= NOW() -- is this really needed?
ORDER BY b.sub_cat_id DESC, b.created_date DESC) AS records
) b
WHERE rn <= 6;
For this query, you want indexes. I think this will work: type, sub_cat_id, created_date)
. Unfortunately, the group by
will still require sorting the data. In more recent versions of MySQL, I think you need to do the sorting in a subquery and then the rn
assignment afterwards.
I do wonder if this formulation could be made to be more effective:
select b.*
from blogs b
where b.type = 'BLOG' and
b.sub_cat_id in (1, 2, 8) and
b.created_at >= (select b2.created_at
from blogs b2
where b2.type = b.type and
b2.sub_cat_id = b.sub_cat_id
order by b2.created_at desc
limit 1 offset 5
);
For this, you want an index on blog(type, sub_cat_id, created_at)
.
Get top n records for each group of grouped results with Bigquery (standard SQL)
This is row_number()
:
select t.*
from (select t.*,
row_number() over (partition by group order by age desc) as seqnum
from t
) t
where seqnum <= 2;
row_number()
is an ANSI standard window function. It is available in most databases. In general, I would suggest that you look more for solutions using Postgres rather than MySQL for solving problems in BQ (if you can't find a BQ resource itself).
Pandas get topmost n records within each group
Did you try
df.groupby('id').head(2)
Output generated:
id value
id
1 0 1 1
1 1 2
2 3 2 1
4 2 2
3 7 3 1
4 8 4 1
(Keep in mind that you might need to order/sort before, depending on your data)
EDIT: As mentioned by the questioner, use
df.groupby('id').head(2).reset_index(drop=True)
to remove the MultiIndex and flatten the results:
id value
0 1 1
1 1 2
2 2 1
3 2 2
4 3 1
5 4 1
Related Topics
The MySQL Extension Is Deprecated and Will Be Removed in the Future: Use MySQLi or Pdo Instead
SQL to Linq With Multiple Join, Count and Left Join
Is There Any Rule of Thumb to Construct SQL Query from a Human-Readable Description
Using Group by on Multiple Columns
Should I Use != or ≪≫ for Not Equal in T-Sql
Difference Between Scope_Identity(), Identity(), @@Identity, and Ident_Current()
Cannot Insert Explicit Value For Identity Column in Table 'Table' When Identity_Insert Is Set to Off
How to Perform Grouped Ranking in MySQL
Count(*) Vs. Count(1) Vs. Count(Pk): Which Is Better
How to Get Matching Data from Another SQL Table For Two Different Columns: Inner Join And/Or Union
Xcode 4 and Core Data: How to Enable SQL Debugging
Insert Results of a Stored Procedure into a Temporary Table
SQL Join - Where Clause Vs. on Clause
Oracle Sql: Update a Table With Data from Another Table