How to select the previous date with a condition
In this case, perhaps a subquery would be simpler than window functions:
SELECT *,(
SELECT max(date) FROM topups q2
WHERE id_user = q.id_user AND
q2.promo_id = 'Y' AND
q.date > q2.date)
FROM topups q ORDER BY date DESC;
id_user | date | promo_id | max
---------+------------+----------+------------
1 | 2017-06-20 | N | 2017-04-20
1 | 2017-05-20 | N | 2017-04-20
1 | 2017-04-20 | Y | 2017-03-20
1 | 2017-03-20 | Y |
1 | 2017-02-20 | N |
(5 rows)
Demo: db<>fiddle
Postgres : lag and lead with special conditions
No need for analytic functions, just sub-selects. Something like the following (untested) should work:
select
id_data,
val_no3,
id_prev,
id_next,
(select val_no2 from b where id_data = x.id_prev) as val_prev,
(select val_no2 from b where id_data = x.id_next) as val_next
from
b x
order by
id_data;
PostgreSQL combine LAG and LEAD to query n previous and following rows
I am not quiet sure if I got your use case correctly: You want to check if one given annotation is in one of the 5 rows (2 preceding, current, 2 following). Correct?
- It is possible to define a window like
BETWEEN 2 PRECEDING AND 2 FOLLOWING
LEAD
orLAG
only give one value, in this case the one value after or before the current row - if a window supports it; no matter how many rows your window contains. But you want to check in any of these five rows.
One way to achieve this:
demo: db<>fiddle
SELECT *
FROM (
SELECT token, annotation, lemma,
unnest(array_agg(annotation) OVER w) as surrounded_annos -- 2
FROM tokens
WINDOW w AS ( -- 1
ORDER BY line, tno ASC
ROWS BETWEEN 2 PRECEDING AND 2 FOLLOWING
)
ORDER BY line, tno ASC
) AS "window"
WHERE
lemma LIKE '...'
AND "window".surrounded_annos LIKE '...'
;
- defining the window as explained above
- aggregate all annotations in these five rows (if possible) with
array_agg
which gives an array unnest
expands this array into one row for every element since IMHO there is no way to search array elements withLIKE
. This gives you this result (which can be filtered in the next step):
- aggregate all annotations in these five rows (if possible) with
Result subquery:
token annotation lemma surrounded_annos
This DEM.PROX this DEM.PROX
This DEM.PROX this VB.COP.3SG.PRES
This DEM.PROX this ART.INDEF
is VB.COP.3SG.PRES be DEM.PROX
is VB.COP.3SG.PRES be VB.COP.3SG.PRES
is VB.COP.3SG.PRES be ART.INDEF
is VB.COP.3SG.PRES be NN.INAN
an ART.INDEF a DEM.PROX
an ART.INDEF a VB.COP.3SG.PRES
an ART.INDEF a ART.INDEF
an ART.INDEF a NN.INAN
example NN.INAN example VB.COP.3SG.PRES
example NN.INAN example ART.INDEF
example NN.INAN example NN.
Using lag and lead function for multiple partitions
You are pretty close. A few small issues may be all that's holding you up. For one, you can't reference the alias last_month_amount
within the same query where you assign it. My preference is to pull the first part into a CTE and then do the calculations in the main query.
Secondly, you may also be seeing unusual behavior due to not handling nulls consistently. Since lag
may return null if you are looking at the first row in the partition, delta
should address this possibility. It appears you're aware delta
may be null (from the logic in your event
case statement), but delta
is currently coming up null for the first row, not the last, so I think you have the 'stop'
case backwards.
You also are missing a FROM
; perhaps you were abbreviating or simplifying your code.
Finally, the way "month" is used in this post is confusing, in part because it's an odd name for a DATE
column. When you say "the same month", it's unclear whether you mean the same date or the same month of the year. If you potentially have distinct rows with the same month (date), seller, and product, then you will need a secondary column in your order by
or you will get unpredictable ordering. Without knowing any of this I'm going to assume you've thought this through and that order by month
is good enough.
It seems like you may also want event to be 'new' if last month's amount was zero, since the previous row would be 'stop'. I've made that change, but adjust accordingly if my assumption is wrong.
Here's the same code with those changes:
with rows_in_context AS (
select month
, seller
, product
, amount
, lag(amount) over (partition by seller,product order by month) as last_month_amount
from some_table
)
select *
, coalesce(amount,0) - coalesce(last_month_amount,0) as delta
, case when COALESCE(last_month_amount,0) = 0 then 'new'
when COALESCE(amount,0) = 0 then 'stop'
when delta > 0 then 'increase'
when delta < 0 then 'decrease'
end as event
from rows_in_context
order by month, seller, product;
Using LAG function with a specific condition
I think that you want a conditional lead()
with option ignore nulls
:
select
id,
date from_date,
case when status = 'ON'
then lead(case when status = 'OFF' then date end ignore nulls)
over(partition by id order by date)
end to_date,
status
from mytable
To better match the expected results, we apply the function to any status other than 'OFF'
, and withdraw 1 day from the result:
select
id,
date from_date,
case when status <> 'OFF'
then lead(case when status = 'OFF' then date end ignore nulls)
over(partition by id order by date) - 1
end to_date,
status
from mytable
If you also want the next 'ON'
date for 'OFF'
rows:
select
id,
date from_date,
case when status <> 'OFF'
then lead(case when status = 'OFF' then date end ignore nulls)
over(partition by id order by date) - 1
else lead(case when status = 'ON' then date end ignore nulls)
over(partition by id order by date) - 1
end to_date,
status
from mytable
Demo on DB Fiddlde
First query:
ID | FROM_DATE | TO_DATE | STATUS
----: | :-------- | :-------- | :-----
15643 | 10-MAR-20 | 20-MAR-20 | ON
15643 | 15-MAR-20 | null | test
15643 | 20-MAR-20 | null | OFF
Second query:
ID | FROM_DATE | TO_DATE | STATUS
----: | :-------- | :-------- | :-----
15643 | 10-MAR-20 | 19-MAR-20 | ON
15643 | 15-MAR-20 | 19-MAR-20 | test
15643 | 20-MAR-20 | null | OFF
Third query (you can't see the difference with the second query since there is no 'OFF'
row with a next 'ON'
):
ID | FROM_DATE | TO_DATE | STATUS
----: | :-------- | :-------- | :-----
15643 | 10-MAR-20 | 19-MAR-20 | ON
15643 | 15-MAR-20 | 19-MAR-20 | test
15643 | 20-MAR-20 | null | OFF
Conditional LEAD/LAG with no sequence guarantee
You have the right expression for prior_path
already. You just need to wrap it in a conditional expression.
As for prior_event
, it is a bit more complicated indeed. I would suggest the following approach:
for events, we can just use
lag()
for pages, an option is to use some gaps-and-islands technique: first define groups with a conditional sum that increments every time an event is met, and then use
first_value()
:
This should do what you want:
select
t.*,
case when type = 'page'
then lag(path) over(partition by sessionid, type order by hit)
end prior_path,
case type
when 'page'
then first_value(event) over(partition by sessionid, grp order by hit)
when 'event'
then lag(event) over(partition by sessionid order by hit)
end prior_event
from (
select
t.*,
sum(case when type = 'event' then 1 else 0 end)
over(partition by sessionid order by hit) grp
from mytable t
) t
Demo on DB Fiddle (due to the lack of hive fiddle in the wild, I used Postgres - but this would work on Hive as well):
sessionid | hit | type | path | event | grp | prior_path | prior_event
--------: | --: | :---- | :------------------------------------ | :---- | --: | :--------------------------------- | :----------
1001 | 1 | event | www.stackoverflow.com | hover | 1 | null | null
1001 | 2 | page | www.stackoverflow.com | null | 1 | null | hover
1001 | 3 | event | www.stackoverflow.com | load | 2 | null | null
1001 | 4 | event | www.stackoverflow.com | blur | 3 | null | load
1001 | 5 | event | www.stackoverflow.com | click | 4 | null | blur
1001 | 6 | page | www.stackoverflow.com/post/10 | null | 4 | www.stackoverflow.com | click
1001 | 7 | event | www.stackoverflow.com/post/10#details | offer | 5 | null | null
1001 | 8 | page | www.stackoverflow.com/post/confirm | null | 5 | www.stackoverflow.com/post/10 | offer
1001 | 9 | page | www.stackoverflow.com/questions/10 | null | 5 | www.stackoverflow.com/post/confirm | offer
1001 | 10 | event | www.stackoverflow.com/questions/10 | exit | 6 | null | null
Using LAG() and PARTITION BY to return rows IF within 10 days of date
This answer is based on Gordon Linoff's idea,
but with a few tweaks:
FILTER is not implemented for pure window functions like lead() or lag() in Postgresql 11 (yet). So use
WHERE fruit_bought='orange'
as a condition on the entire inner SELECT.To guarantee selection of the row with the last date, use
LEAD(date, 1, '-infinity')
. This makes the default value fornext_date
equal to a-infinity
timestamp. Thusdate >= next_date - interval '10 day'
will be TRUE for the last date.Let's call rows within 10 days of each other a cluster. To select only rows from the last cluster,
compute a cumulative sum which counts how many timescond
is FALSE (since FALSE values separate clusters):SUM(CASE WHEN cond IS TRUE THEN 0 ELSE 1 END) OVER (ORDER BY date DESC) AS cluster_num
and select only rows for which cluster_num equals 0. Since we
ORDER BY date DESC
, the 0th cluster is the last cluster.
SELECT *
FROM (
SELECT *, SUM(CASE WHEN cond IS TRUE THEN 0 ELSE 1 END) OVER (ORDER BY date DESC) AS cluster_num
FROM (
SELECT *, date >= next_date - interval '10 day' AS cond
FROM (
SELECT id, fruit_bought, date,
LEAD(date, 1, '-infinity')
OVER (PARTITION BY fruit_bought ORDER BY date) AS next_date
FROM fruits
WHERE fruit_bought='orange'
-- restrict date here to specify an "initial date"
AND date <= '2018-04-01'
) t1
) t2
) t3
WHERE cond AND cluster_num = 0
ORDER BY date ASC
yields
| id | fruit_bought | date | next_date | cond | cluster_num |
|----+--------------+------------+------------+------+-------------|
| 3 | orange | 2018-03-07 | 2018-03-15 | t | 0 |
| 4 | orange | 2018-03-15 | 2018-03-20 | t | 0 |
| 6 | orange | 2018-03-20 | -infinity | t | 0 |
Setup:
CREATE TABLE fruits (
fruitid INT PRIMARY KEY GENERATED BY DEFAULT AS IDENTITY,
id INT,
fruit_bought TEXT,
quantity INT,
date DATE);
INSERT INTO fruits (id, fruit_bought, quantity, date)
VALUES (1,'orange',100,'2018-01-10')
, (2,'apple',50,'2018-02-05')
, (3,'orange',75,'2018-03-07')
, (4,'orange',200,'2018-03-15')
, (5,'apple',10,'2018-03-17')
, (6,'orange',20,'2018-03-20')
, (7,'orange',20,'2018-01-09');
LAG with condition
Here is one approach using analytic functions:
WITH cte AS (
SELECT *, COUNT(CASE WHEN event = 1 THEN 1 END) OVER
(PARTITION BY customer_id ORDER BY ts) cnt
FROM yourTable
)
SELECT ts, customer_id, event,
MAX(CASE WHEN event = 1 THEN ts END) OVER
(PARTITION BY customer_id, cnt) AS desired_result
FROM cte
ORDER BY customer_id, ts;
Demo
We can articulate your problem by saying that your want the desired_result
column to contain the most recent timestamp value when the event was 1. The count (cnt
) in the CTE above computes a pseudo group of records for each time the event is 1. Then we simply do a conditional aggregation over customer and pseudo group to find the timestamp value.
Jump SQL gap over specific condition & proper lead() usage
Query with window functions
SELECT *
FROM (
SELECT *
,lag(val, 1, 0) OVER (PARTITION BY status ORDER BY id) AS last_val
,lag(status, 1, 0) OVER w2 AS last_status
,lag(next_id) OVER w2 AS next_id_of_last_status
FROM (
SELECT *, lead(id) OVER (PARTITION BY status ORDER BY id) AS next_id
FROM t1
) AS t
WINDOW w2 AS (PARTITION BY val ORDER BY id)
) x
WHERE (last_val <> val OR last_status <> status)
AND (status = 1
OR last_status = 1
AND ((next_id_of_last_status > id) OR next_id_of_last_status IS NULL)
)
ORDER BY id
In addition to what we already had, we need valid OFF switches.
An OFF
switch if valid if the device was switched ON
before (last_status = 1
) and the next ON
operation after that comes after the OFF
switch in question (next_id_of_last_status > id
).
We have to provide for the special case that there is was the last ON
operation, so we check for NULL
in addition (OR next_id_of_last_status IS NULL
).
The next_id_of_last_status
comes from the same window that we take last_status
from. Therefore I introduced additional syntax for explicit window declaration, so I don't have to repeat myself:
WINDOW w2 AS (PARTITION BY val ORDER BY id)
And we need to get the next id for the last status in a subquery earlier (subquery t
).
If you've understood all that, you shouldn't have a problem slapping lead()
on top of this query to get to your final destination. :)
PL/pgSQL function
Once it gets this complex, it's time to switch to procedural processing.
This comparatively simple plpgsql function nukes the performance of the complex window function query, for the simple reason that it has to scan the whole table only once.
CREATE OR REPLACE FUNCTION valid_t1 (OUT t t1) -- row variable of table type
RETURNS SETOF t1 LANGUAGE plpgsql AS
$func$
DECLARE
_last_on int := -1; -- init with impossible value
BEGIN
FOR t IN
SELECT * FROM t1 ORDER BY id
LOOP
IF t.status = 1 THEN
IF _last_on <> t.val THEN
RETURN NEXT;
_last_on := t.val;
END IF;
ELSE
IF _last_on = t.val THEN
RETURN NEXT;
_last_on := -1;
END IF;
END IF;
END LOOP;
END
$func$;
Call:
SELECT * FROM valid_t1();
Related Topics
Oracle: Updating a Table Column Using Rownum in Conjunction with Order by Clause
Looping Through Column Names with Dynamic SQL
Ssis Best Practice to Load N Tables from Source to Target Server
Left Join Query Not Returning All Rows in First Table
What Is Your Naming Convention for Stored Procedures
Convert a String to Int Using SQL Query
T-SQL Subquery Max(Date) and Joins
Difference Between "On .. And" and "On .. Where" in SQL Left Join
MySQL Convert Latin1 Data to Utf8
Perform Regex (Replace) in an SQL Query
Find Last Row in Group by Query-SQL Server
SQL Server Select into @Variable
Where Clause to Find All Records in a Specific Month