Sql: How to Select Earliest Row

SQL: How To Select Earliest Row

Simply use min()

SELECT company, workflow, MIN(date) 
FROM workflowTable
GROUP BY company, workflow

Select row with the earliest date

You can use window functions for this, either ROW_NUMBER() or MIN(). The idea is to partition the rows by the ID - OVER (PARTITION BY id) - and then either assign row numbers (ordered by the datetime) or find the minimum datetime per ID.

Solution with ROW_NUMBER():

; WITH cte AS
( SELECT id, datetime, string,
rn = ROW_NUMBER() OVER (PARTITION BY id ORDER BY datetime)
FROM tableX
)
SELECT id, datetime, string
FROM cte
WHERE rn = 1 ;

and with MIN():

; WITH cte AS
( SELECT id, datetime, string,
min_datetime = MIN(datetime) OVER (PARTITION BY id)
FROM tableX
)
SELECT id, datetime, string
FROM cte
WHERE datetime = min_datetime ;

The second version has slightly different behaviour. If there are two (or more) rows with exactly same datetime for an ID, they will be both in the result.

Select earliest date and count rows in table with duplicate IDs

You would use aggregation:

select id, min(create_date), count(*)
from table1
group by id;

I find it amusing that you want to use window functions -- which are considered more advanced -- when lowly aggregation suffices.

Get top 1 row of each group

;WITH cte AS
(
SELECT *,
ROW_NUMBER() OVER (PARTITION BY DocumentID ORDER BY DateCreated DESC) AS rn
FROM DocumentStatusLogs
)
SELECT *
FROM cte
WHERE rn = 1

If you expect 2 entries per day, then this will arbitrarily pick one. To get both entries for a day, use DENSE_RANK instead

As for normalised or not, it depends if you want to:

  • maintain status in 2 places
  • preserve status history
  • ...

As it stands, you preserve status history. If you want latest status in the parent table too (which is denormalisation) you'd need a trigger to maintain "status" in the parent. or drop this status history table.

Selecting the first N rows of each group ordered by date

As well as the row_number solution, another option is CROSS APPLY(SELECT TOP:

SELECT m.masterid,
d.detailid,
m.numbers,
d.date_time,
d.value
FROM masters AS m
CROSS APPLY (
SELECT TOP (3) *
FROM details AS d
WHERE d.date_time >= '2020-01-01'
AND m.masterid = d.masterid
) AS d
WHERE m.tags LIKE '%Tag2%'
ORDER BY m.masterid DESC,
d.date_time;

This may be faster or slower than row_number, mostly depending on cardinalities (quantity of rows) and indexing.

If indexing is good and it's a small number of rows it will usually be faster. If the inner table needs sorting or you are anyway selecting most rows then use row_number.



Related Topics



Leave a reply



Submit