Selecting the first N rows of each group ordered by date
As well as the row_number
solution, another option is CROSS APPLY(SELECT TOP
:
SELECT m.masterid,
d.detailid,
m.numbers,
d.date_time,
d.value
FROM masters AS m
CROSS APPLY (
SELECT TOP (3) *
FROM details AS d
WHERE d.date_time >= '2020-01-01'
AND m.masterid = d.masterid
) AS d
WHERE m.tags LIKE '%Tag2%'
ORDER BY m.masterid DESC,
d.date_time;
This may be faster or slower than row_number
, mostly depending on cardinalities (quantity of rows) and indexing.
If indexing is good and it's a small number of rows it will usually be faster. If the inner table needs sorting or you are anyway selecting most rows then use row_number
.
How to select more than 1 record per day?
I want to select at most 3 records per day from a specific date range.
SELECT date_time, other_column
FROM (
SELECT *, row_number() OVER (PARTITION BY date_time::date) AS rn
FROM tbl
WHERE date_time >= '2012-11-01 0:0'
AND date_time < '2012-12-01 0:0'
) x
WHERE rn < 4;
Major points
Use the window function
row_number()
.rank()
ordense_rank()
would be wrong according to the question - more than 3 records might be selected with timestamp duplicates.Since you do not define which rows you want per day, the correct answer is not to include an
ORDER BY
clause in the window function. Gives you an arbitrary selection, which matches the question.I changed your
WHERE
clause fromWHERE date_time >= '20121101 00:00:00'
AND date_time <= '20121130 23:59:59'to
WHERE date_time >= '2012-11-01 0:0'
AND date_time < '2012-12-01 0:0'Your syntax would fail for corner cases like
'20121130 23:59:59.123'
.What @Craig suggested:
date_time::date BETWEEN '2012-11-02' AND '2012-11-05'
.. would work correctly, but is an anti-pattern regarding performance. If you apply a cast or a function to your database column in the expression, plain indexes cannot be used.
Solution for PostgreSQL 8.3
Best solution: Upgrade to a more recent version, preferably to the current version 9.2.
Other solutions:
For only few days you could employ UNION ALL
:
SELECT date_time, other_column
FROM tbl t1
WHERE date_time >= '2012-11-01 0:0'
AND date_time < '2012-11-02 0:0'
LIMIT 3
)
UNION ALL
(
SELECT date_time, other_column
FROM tbl t1
WHERE date_time >= '2012-11-02 0:0'
AND date_time < '2012-11-03 0:0'
LIMIT 3
)
...
Parenthesis are not optional here.
For more days there are workarounds with generate_series()
- something like I posted here (including a link to more).
I might have solved it with a plpgsql function back in the old days before we had window functions:
CREATE OR REPLACE FUNCTION x.f_foo (date, date, integer
, OUT date_time timestamp, OUT other_column text)
RETURNS SETOF record AS
$BODY$
DECLARE
_last_day date; -- remember last day
_ct integer := 1; -- count
BEGIN
FOR date_time, other_column IN
SELECT t.date_time, t.other_column
FROM tbl t
WHERE t.date_time >= $1::timestamp
AND t.date_time < ($2 + 1)::timestamp
ORDER BY t.date_time::date
LOOP
IF date_time::date = _last_day THEN
_ct := _ct + 1;
ELSE
_ct := 1;
END IF;
IF _ct <= $3 THEN
RETURN NEXT;
END IF;
_last_day := date_time::date;
END LOOP;
END;
$BODY$ LANGUAGE plpgsql STABLE STRICT;
COMMENT ON FUNCTION f_foo(date3, date, integer) IS 'Return n rows per day
$1 .. date_from (incl.)
$2 .. date_to (incl.)
$3 .. maximim rows per day';
Call:
SELECT * FROM f_foo('2012-11-01', '2012-11-05', 3);
Get top 1 row of each group
;WITH cte AS
(
SELECT *,
ROW_NUMBER() OVER (PARTITION BY DocumentID ORDER BY DateCreated DESC) AS rn
FROM DocumentStatusLogs
)
SELECT *
FROM cte
WHERE rn = 1
If you expect 2 entries per day, then this will arbitrarily pick one. To get both entries for a day, use DENSE_RANK instead
As for normalised or not, it depends if you want to:
- maintain status in 2 places
- preserve status history
- ...
As it stands, you preserve status history. If you want latest status in the parent table too (which is denormalisation) you'd need a trigger to maintain "status" in the parent. or drop this status history table.
SQL - Select first 10 rows only?
In SQL server, use:
select top 10 ...
e.g.
select top 100 * from myTable
select top 100 colA, colB from myTable
In MySQL, use:
select ... order by num desc limit 10
MySQl - SQL - Top 5 records per day
If this is going to be used daily, then you should consider to create a separate table and fill the data in it using procedure. There is still better way to do this(using merge). This is just for your reference.
create table daily_results
(`VISIT_DATE` date, `PAGE_ID` varchar(20), `SERVER_NAME` varchar(50), `NUM_VISITS` int);
CREATE PROCEDURE proc_loop_test( IN startdate date, in enddate date)
BEGIN
WHILE(startdate < enddate) DO
insert into daily_results (select * from PAGEVISITS where VISIT_DATE=startdate order by NUM_VISITS desc limit 5);
SET startdate = date_add(startdate, INTERVAL 1 DAY);
end WHILE;
END;
call it using
call proc_loop_test(`2015-01-01`,`2015-03-15`);
select * from daily_results;
How to enforce a max limit no. of rows per day per date in SQL?
Since 6 rows over 2 days means exactly 3 rows per day, let's expand it to a week.
In a subquery use row_number
to assign a number to each row for each date. Then only select those with a row number of 3 or less.
select *
from (
select
*,
row_number() over (partition by sales_date order by rand()) as row
from mytable
where sales_date between '20211002' and '20211009'
)
where row <= 3
order by rand()
limit 6
Related Topics
Why Is My Left Join Not Returning Nulls
When Should I Nest Pl/SQL Begin...End Blocks
Does Introducing Foreign Keys to MySQL Reduce Performance
How to Retrieve The Primary Key When Saving a New Object in Anorm
Using a View with No Primary Key with Entity
In SQL Server, How to Create While Loop in Select
Sql Query for Insert in Grails
Nosql VS. SQL When Scalability Is Irrelevant
Using Sqldf and Rpostgresql Together
How to Select MySQL Rows in The Order of in Clause
Multiple Inner Join from The Same Table
Parsing Nested Xml into SQL Table
Sql: How to Get All The Distinct Characters in a Column, Across All Rows
Computed Column Cannot Be Persisted