Selecting The Top N Rows Within a Group by Clause

Get top 1 row of each group

;WITH cte AS
(
SELECT *,
ROW_NUMBER() OVER (PARTITION BY DocumentID ORDER BY DateCreated DESC) AS rn
FROM DocumentStatusLogs
)
SELECT *
FROM cte
WHERE rn = 1

If you expect 2 entries per day, then this will arbitrarily pick one. To get both entries for a day, use DENSE_RANK instead

As for normalised or not, it depends if you want to:

  • maintain status in 2 places
  • preserve status history
  • ...

As it stands, you preserve status history. If you want latest status in the parent table too (which is denormalisation) you'd need a trigger to maintain "status" in the parent. or drop this status history table.

Selecting the top n rows within a group by clause

CROSS APPLY is how you usually do this - http://msdn.microsoft.com/en-us/library/ms175156.aspx

EDIT - add example, something like this:

select
bar1.instrument
,bar2.*
from (
select distinct instrument from bar) as bar1
cross apply (
select top 5
bar2.instrument
,bar2.bar_dttm
,bar2.bar_open
,bar2.bar_close
from bar as bar2 where bar2.instrument = bar1.instrument) as bar2

Typically you would want to add an order by in there.

Edit - added distinct to the query, hopefully that gives you want you want.
Edit - added missing 'select' keyword at top. copy & paste bug FTL!

Get top n records for each group of grouped results

Here is one way to do this, using UNION ALL (See SQL Fiddle with Demo). This works with two groups, if you have more than two groups, then you would need to specify the group number and add queries for each group:

(
select *
from mytable
where `group` = 1
order by age desc
LIMIT 2
)
UNION ALL
(
select *
from mytable
where `group` = 2
order by age desc
LIMIT 2
)

There are a variety of ways to do this, see this article to determine the best route for your situation:

http://www.xaprb.com/blog/2006/12/07/how-to-select-the-firstleastmax-row-per-group-in-sql/

Edit:

This might work for you too, it generates a row number for each record. Using an example from the link above this will return only those records with a row number of less than or equal to 2:

select person, `group`, age
from
(
select person, `group`, age,
(@num:=if(@group = `group`, @num +1, if(@group := `group`, 1, 1))) row_number
from test t
CROSS JOIN (select @num:=0, @group:=null) c
order by `Group`, Age desc, person
) as x
where x.row_number <= 2;

See Demo

How do I select top N rows grouped by an ID in big query?

Use below

select *
from your_table
where in_stock
qualify 3 >= row_number() over(partition by aisle order by price desc)

if applied to sample data in your question - output is

Sample Image

Selecting the first N rows of each group ordered by date

As well as the row_number solution, another option is CROSS APPLY(SELECT TOP:

SELECT m.masterid,
d.detailid,
m.numbers,
d.date_time,
d.value
FROM masters AS m
CROSS APPLY (
SELECT TOP (3) *
FROM details AS d
WHERE d.date_time >= '2020-01-01'
AND m.masterid = d.masterid
) AS d
WHERE m.tags LIKE '%Tag2%'
ORDER BY m.masterid DESC,
d.date_time;

This may be faster or slower than row_number, mostly depending on cardinalities (quantity of rows) and indexing.

If indexing is good and it's a small number of rows it will usually be faster. If the inner table needs sorting or you are anyway selecting most rows then use row_number.

How to select top N rows for each group in a Entity Framework GroupBy with EF 3.1

Update (EF Core 6.0):

EF Core 6.0 added support for translating GroupBy result set projection, so the original code for taking (key, items) now works as it should, i.e.

var query = context.Set<DbDocument>()
.Where(e => partnerIds.Contains(e.SenderId))
.GroupBy(e => e.SenderId)
.Select(g => new
{
g.Key,
Documents = g.OrderByDescending(e => e.InsertedDateTime).Take(10)
});

However flattening (via SelectMany) is still unsupported, so you have to use the below workaround if you need such query shape.

Original (EF Core 3.0/3.1/5.0):

This is a common problem, unfortunately not supported by EF Core 3.0/3.1/5.0 query translator specifically for GroupBy.

The workaround is to do the groping manually by correlating 2 subqueries - one for keys and one for corresponding data.

Applying it to your examples would be something like this.

If you need (key, items) pairs:

var query = context.Set<DbDocument>()
.Where(t => partnerIds.Contains(t.SenderId))
.Select(t => t.SenderId).Distinct() // <--
.Select(key => new
{
Key = key,
Documents =
context.Set<DbDocument>().Where(t => t.SenderId == key) // <--
.OrderByDescending(t => t.InsertedDateTime).Take(10)
.ToList() // <--
});

If you need just flat result set containing top N items per key:

var query = context.Set<DbDocument>()
.Where(t => partnerIds.Contains(t.SenderId))
.Select(t => t.SenderId).Distinct() // <--
.SelectMany(key => context.Set<DbDocument>().Where(t => t.SenderId == key) // <--
.OrderByDescending(t => t.InsertedDateTime).Take(10)
);

Select first `n` rows of a grouped query

The row_number function is exactly what you're looking for:

SELECT * 
FROM (SELECT *, ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY entry_time DESC) AS rn
FROM user_gps_location
WHERE entry_time > '2020-09-01') t
WHERE rn <= 5

How to select top n row from each group after group by in pandas?

I'd recommend sorting your counts in descending order first, and you can call GroupBy.head after—

(freq_df.sort_values('count', ascending=False)
.groupby(['open_year','open_month'], sort=False).head(5)
)

SELECT TOP 20 rows for each group

The easiest way would be to use the row_number() window function to number the rows for each city according to their visitnumber descending and use that as a filter. This query should work in any SQL Server version from 2005 onwards.

select * 
from (
select *, r = row_number() over (partition by City order by VisitNumber desc)
from your_table
) a
where r <= 20
and City in ('Washington', 'New York', 'Los Angeles')

This would select the top 20 items for each city specified in the where clause.



Related Topics



Leave a reply



Submit