How to Select Rows with Most Recent Timestamp for Each Key Value

select the rows that have more that X occurrences of the same foreign key and ignore the X most recent (timestamp)

Use COUNT() and ROW_NUMBER() window functions:

SELECT id, fk, timestamp 
FROM (
SELECT *,
COUNT(*) OVER (PARTITION BY fk) counter,
ROW_NUMBER() OVER (PARTITION BY fk ORDER BY timestamp DESC) rn
FROM tablename
) t
WHERE counter > ? AND rn > ?;

Replace ? with the values that you want.

But, if the same number X is applied for both the number of total rows for each fk and the number of rows to be dismissed, the query can be simplified:

SELECT id, fk, timestamp 
FROM (
SELECT *, ROW_NUMBER() OVER (PARTITION BY fk ORDER BY timestamp DESC) rn
FROM tablename
) t
WHERE rn > ?;

See the demo.

most efficient way to select duplicate rows with max timestamp

Use DISTINCT ON:

SELECT DISTINCT ON (id) id, content, time
FROM yourTable
ORDER BY id, time DESC;

On Postgres, this is usually the most performant way to write your query, and it should outperform ROW_NUMBER and other approaches.

The following index might speed up this query:

CREATE INDEX idx ON yourTable (id, time DESC, content);

This index, if used, would let Postgres rapidly find, for each id, the record having the latest time. This index also covers the content column.

How to select rows with a timestamp in a time period of a row that meets a condition

Invert the problem and look for error code 100 in the 3 seconds before each error. For this, use window functions and filtering. For the rows with the errors:

select t.*
from (select t.*,
countif(err_code = 100) over (
partition by machine_no
order by unix_seconds(timestamp)
range between 3 preceding and current row
) as cnt_100
from t
) t
where cnt_100 > 0 and err_code <> 100;

Note that any row with err_code = 100 is going to meet the countif() condition. However, those are filtered out by the outer where clause.

SQL selecting rows by most recent date with one unique column (selecting 3+ columns)

The classic way to do this and still very useful in this case:

select 
ean,
price,
dtmod
from my_table t0
where
t0.dtmod = (
select max(dtmod)
from my_table t1
where t1.ean = t0.ean
)
order by ean
;

Also, super helpful for everyone is to know how to use the window functions, any SQL programmer should spend some time to understand all of the window functions.

with
w_o as (
select
ean,
price,
dtmod,
row_number() OVER (PARTITION BY ean ORDER BY dtmod DESC) rn
from my_table
)
select
ean,
price,
dtmod
from w_o
where rn = 1
order by ean
;

Note:

For testing, I added a duplicate record with same EAN, and same timestamp to the test data. The first query gives both the second gives only one. You can determine which one it returns by tuning the ordering.

Edit 1.
I forgot the order by. :)

Edit 2.
Added window function information.

http://sqlfiddle.com/#!17/8278a/1/1

Find first 3 orders for each customer

You can use the window function ROW_NUMBER() to numerate the columns of a given PARTITION (customer_id in your case) within a CTE and in the outer query just filter the n records from this generated column:

WITH j AS (
SELECT customer_id, timestamp,
ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY timestamp
RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS n
FROM customer_orders
)
SELECT customer_id, timestamp FROM j
WHERE n <= 3
ORDER BY customer_id, timestamp

Demo: db<>fiddle



Related Topics



Leave a reply



Submit