Selecting Top N Elements of a Group in Oracle

Selecting top n elements of a group in Oracle

Is this what you are after?

My test set-up:

SQL> alter session set nls_date_format = 'DD-Mon-YYYY HH24:Mi:SS';

Session altered.

SQL> drop table so_test;

Table dropped.

SQL> create table so_test (
2 n varchar2(32)
3 , v varchar2(32)
4 , t date );

Table created.

SQL>
SQL> insert into so_test values ( 'X' , 'Test1', to_date('01-Jan-2011 12:00:00','DD-Mon-YYYY HH24:Mi:SS') );

1 row created.

SQL> insert into so_test values ( 'X' , 'Test2', to_date('01-Jan-2011 13:00:00','DD-Mon-YYYY HH24:Mi:SS') );

1 row created.

SQL> insert into so_test values ( 'X' , 'Test3', to_date('01-Jan-2011 14:00:00','DD-Mon-YYYY HH24:Mi:SS') );

1 row created.

SQL> insert into so_test values ( 'Y' , 'Test5', to_date('02-Jan-2011 12:00:00','DD-Mon-YYYY HH24:Mi:SS') );

1 row created.

SQL> insert into so_test values ( 'Y' , 'Test6', to_date('03-Jan-2011 12:00:00','DD-Mon-YYYY HH24:Mi:SS') );

1 row created.

SQL> insert into so_test values ( 'Y' , 'Test7', to_date('04-Jan-2011 12:00:00','DD-Mon-YYYY HH24:Mi:SS') );

1 row created.

SQL>

Here is the query:

SQL> select n,v,t from (
2 select n, v , t , rank() over ( partition by n order by t desc) r
3 from so_test
4 ) where r <= 2;

N V T
-------------------------------- -------------------------------- --------------------
X Test3 01-Jan-2011 14:00:00
X Test2 01-Jan-2011 13:00:00
Y Test7 04-Jan-2011 12:00:00
Y Test6 03-Jan-2011 12:00:00

SQL>

Oracle SQL: How to SELECT N records for each group / cluster

I recommend an analytical function such as rank() or row_number(). You could do this with hard-coded unions, but the analytical function does all the hard work for you.

select *
from
(
select
bt.col_a,
bt.col_b,
bt.process_type_cod,
row_number() over ( partition by process_type_cod order by col_a nulls last ) rank
from small_table st
inner join big_table bt
on st.process_type_cod = bt.process_type_cod
)
where rank < 11
;

You may not even need that join since big_table has all of the types you care about. In that case, just change the 'from clause' to use big_table and drop the join.

What this does is performs the query and then sorts the records using the 'order by' operator in the partition statement. For a given group (here we grouped by col_a), a numerical row number (i.e. 1, 2, 3, 4, 5, n+1...) is applied to each record consecutively. In the outer where clause, just filter by the records with a number lower than N.

Return the first n records per group Oracle SQL

Consider this non-Windows function approach using a count correlated aggregate query. The idea is to run a department rank subquery and then use that in a derived table that filters outer query by this department rank. Please note your desired results do not return by ordered START_DATE but simply query's row number.

SELECT main.EMPLOYEE, t.START_DATE, t.DEPARTMENT
FROM
(SELECT t.EMPLOYEE, t.START_DATE, t.DEPARTMENT,
(SELECT Count(*) FROM Employees sub
WHERE sub.START_DATE <= t.START_DATE
AND sub.Department = t.Department) AS DeptRank
FROM Employees t) main
WHERE main.DeptRank <= 3
ORDER BY main.DEPARTMENT, main.START_DATE;

-- EMPLOYEE START_DATE DEPARTMENT
-- Tawnee 1/2/1904 Legal
-- Jacinta 1/2/1924 Legal
-- Kirsten 1/2/1933 Legal
-- Edwina 1/2/1902 Mergers
-- Louise 1/2/1912 Mergers
-- Kelly 1/2/1954 Mergers
-- Jane 1/2/1900 Sales
-- Amy 1/2/1901 Sales
-- Cara 1/2/1955 Sales

For the Windows function counterpart:

SELECT main.EMPLOYEE, t.START_DATE, t.DEPARTMENT
FROM
(SELECT t.EMPLOYEE, t.START_DATE, t.DEPARTMENT,
RANK() OVER (PARTITION BY Department
ORDER BY START_DATE) AS DeptRank
FROM Employees t) main
WHERE main.DeptRank <= 3
ORDER BY main.DEPARTMENT, main.START_DATE;

And as @Matt comments, you may want to handle ties (i.e., employees who started on same day). Both above solutions will output all such employees depending on rank filter. To take one of the ties in correlated subquery, use Employee name as tiebreaker (or better yet a unique ID if available):

SELECT main.EMPLOYEE, t.START_DATE, t.DEPARTMENT
FROM
(SELECT t.EMPLOYEE, t.START_DATE, t.DEPARTMENT,
(SELECT Count(*) FROM Employees sub
WHERE sub.Department = t.Department
AND (sub.START_DATE <= t.START_DATE
OR sub.START_DATE = t.START_DATE
AND sub.EMPLOYEE < t.EMPLOYEE) AS DeptRank
FROM Employees t) main
WHERE main.DeptRank <= 3
ORDER BY main.DEPARTMENT, main.START_DATE;

And for window-function query use ROW_NUMBER() in place of RANK().

Oracle SELECT TOP 10 records

You'll need to put your current query in subquery as below :

SELECT * FROM (
SELECT DISTINCT
APP_ID,
NAME,
STORAGE_GB,
HISTORY_CREATED,
TO_CHAR(HISTORY_DATE, 'DD.MM.YYYY') AS HISTORY_DATE
FROM HISTORY WHERE
STORAGE_GB IS NOT NULL AND
APP_ID NOT IN (SELECT APP_ID FROM HISTORY WHERE TO_CHAR(HISTORY_DATE, 'DD.MM.YYYY') ='06.02.2009')
ORDER BY STORAGE_GB DESC )
WHERE ROWNUM <= 10

Oracle applies rownum to the result after it has been returned.
You need to filter the result after it has been returned, so a subquery is required. You can also use RANK() function to get Top-N results.

For performance try using NOT EXISTS in place of NOT IN. See this for more.

How to select top N rows for each group in a Entity Framework GroupBy with EF 3.1

Update (EF Core 6.0):

EF Core 6.0 added support for translating GroupBy result set projection, so the original code for taking (key, items) now works as it should, i.e.

var query = context.Set<DbDocument>()
.Where(e => partnerIds.Contains(e.SenderId))
.GroupBy(e => e.SenderId)
.Select(g => new
{
g.Key,
Documents = g.OrderByDescending(e => e.InsertedDateTime).Take(10)
});

However flattening (via SelectMany) is still unsupported, so you have to use the below workaround if you need such query shape.

Original (EF Core 3.0/3.1/5.0):

This is a common problem, unfortunately not supported by EF Core 3.0/3.1/5.0 query translator specifically for GroupBy.

The workaround is to do the groping manually by correlating 2 subqueries - one for keys and one for corresponding data.

Applying it to your examples would be something like this.

If you need (key, items) pairs:

var query = context.Set<DbDocument>()
.Where(t => partnerIds.Contains(t.SenderId))
.Select(t => t.SenderId).Distinct() // <--
.Select(key => new
{
Key = key,
Documents =
context.Set<DbDocument>().Where(t => t.SenderId == key) // <--
.OrderByDescending(t => t.InsertedDateTime).Take(10)
.ToList() // <--
});

If you need just flat result set containing top N items per key:

var query = context.Set<DbDocument>()
.Where(t => partnerIds.Contains(t.SenderId))
.Select(t => t.SenderId).Distinct() // <--
.SelectMany(key => context.Set<DbDocument>().Where(t => t.SenderId == key) // <--
.OrderByDescending(t => t.InsertedDateTime).Take(10)
);

Select only n rows in an SQL for an element

You can use row_number()

select * from
(
select *, row_number() over(partition by table_2_id order by id desc) as rn
from tablename
)A where rn<=5


Related Topics



Leave a reply



Submit