How to Find Consecutive Rows Based on the Value of a Column

How to find consecutive rows based on the value of a column?

Try this

WITH cte
AS
(
SELECT *,COUNT(1) OVER(PARTITION BY cnt) pt FROM
(
SELECT tt.*
,(SELECT COUNT(id) FROM t WHERE data <= 10 AND ID < tt.ID) AS cnt
FROM t tt
WHERE data > 10
) t1
)

SELECT id, [when], data FROM cte WHERE pt >= 3

SQL FIDDLE DEMO

OUTPUT

id  when                    data
2 2013-08-02 00:00:00.000 121
3 2013-08-03 00:00:00.000 132
4 2013-08-04 00:00:00.000 15
6 2013-08-06 00:00:00.000 1435
7 2013-08-07 00:00:00.000 143
8 2013-08-08 00:00:00.000 18
9 2013-08-09 00:00:00.000 19

EDIT

First the inner query counts the no of records where data <= 10

SELECT tt.*
,(SELECT COUNT(id) FROM t WHERE data <= 10 AND ID < tt.ID) AS cnt
FROM t tt

output

id  when                    data   cnt
1 2013-08-01 00:00:00.000 1 1
2 2013-08-02 00:00:00.000 121 1
3 2013-08-03 00:00:00.000 132 1
4 2013-08-04 00:00:00.000 15 1
5 2013-08-05 00:00:00.000 9 2
6 2013-08-06 00:00:00.000 1435 2
7 2013-08-07 00:00:00.000 143 2
8 2013-08-08 00:00:00.000 18 2
9 2013-08-09 00:00:00.000 19 2
10 2013-08-10 00:00:00.000 1 3
11 2013-08-11 00:00:00.000 1234 3
12 2013-08-12 00:00:00.000 124 3
13 2013-08-13 00:00:00.000 6 4

Then we filter the records with data > 10

WHERE data > 10

Now we count the records by partitoning cnt column

SELECT *,COUNT(1) OVER(PARTITION BY cnt) pt  FROM
(
SELECT tt.*
,(SELECT COUNT(id) FROM t WHERE data <= 10 AND ID < tt.ID) AS cnt
FROM t tt
WHERE data > 10
) t1

Output

id  when    data                   cnt  pt
2 2013-08-02 00:00:00.000 121 1 3
3 2013-08-03 00:00:00.000 132 1 3
4 2013-08-04 00:00:00.000 15 1 3
6 2013-08-06 00:00:00.000 1435 2 4
7 2013-08-07 00:00:00.000 143 2 4
8 2013-08-08 00:00:00.000 18 2 4
9 2013-08-09 00:00:00.000 19 2 4
11 2013-08-11 00:00:00.000 1234 3 2
12 2013-08-12 00:00:00.000 124 3 2

The above query is put in cte just like temp table

Now select the records that are having the consecutive count >= 3

SELECT id, [when], data FROM cte WHERE pt >= 3

ANOTHER SOLUTION

;WITH partitioned AS (
SELECT *, id - ROW_NUMBER() OVER (ORDER BY id) AS grp
FROM t
WHERE data > 10
),
counted AS (
SELECT *, COUNT(*) OVER (PARTITION BY grp) AS cnt
FROM partitioned
)

SELECT id, [when], data
FROM counted
WHERE cnt >= 3

Reference URL

SQL FIDDLE DEMO

Count consecutive rows for each customer and value

For gaps and islands solution, for first row_number you need to partition by customer.

SELECT customer,    status, COUNT(*) FROM (
select t.*,
(row_number() over (partition by customer order by id) -
row_number() over (partition by customer, status order by id)
) as grp
from tickets t
) X
GROUP BY customer, status, grp
ORDER BY customer, max(id)

dbfiddle

Result:

customer    status  count
-------- ------ -----
A 0 4
A 1 3
B 0 2
B 1 1
B 0 2
C 0 1

Get the rows of dataframe based on the consecutive values of one column

Not the pythonic way, but doing the work:

keep = []
for i in range(len(df) - 2):
if (df.View[i]=='a') & (df.View[i+1] =='p') & (df.View[i+2] =='p'):
keep.append(df[i])
keep.append(df[i+1])
keep.append(df[i+2])

Result:

Sample Image

How can I find 5 consecutive rows in pandas Dataframe where a value of a certain column is at least 0.5

Here is a pandas, non-iterative approach, and therefore quite efficient.

Steps:

  • Create a rolling window of 5 points and determine the minimum value.
  • If the minimum value is >= 0.5, store True, else store False.
  • All booleans are stored in a numpy.array, called idx.
  • The idx array is used as a filter on the main dataset with a value of 4 subtracted to determine the first index of the run of 5.
  • The filtered DataFrame is presented.

Sample code:

idx = (df['residual'].rolling(window=5).min() >= 0.5).to_numpy()
df.iloc[df.index[idx]-4]

Output:

Index                       Time  real_generation  predicted_generation  residual
1 2019-01-01 11:00:00+00:00 0.126 0.627 0.501

Find 5 consecutive row values in Pandas Dataframe that are equal

One classical way is to use boolean indexing with a custom mask. Breaking it down, it relies on making groups of consecutive Match values, and counting the group size to slice the matching rows.

m = df.groupby(df['Col3'].ne('Match').cumsum())['Col3'].transform('size').ge(5)
df[m&m.shift()]

Alternatively:

m = df['Col3'].ne('Match')
m2 = df.groupby((m|m.shift()).cumsum())['Col3'].transform('size').ge(5)
df[m2]

output:

    Col1 Col2   Col3
2 Time3 c Match
3 Time4 d Match
4 Time5 e Match
5 Time6 f Match
6 Time7 g Match

Generate Identifier for consecutive rows with same value

This is a Gaps & Islands problem that is solved using the traditional solution.

For example:

select
*,
sum(inc) over(order by date desc, type) as grp
from (
select *,
case when type <> lag(type) over(order by date desc, type)
then 1 else 0 end as inc
from test
) x
order by date desc, type

Result:

 CustomerId  Type  date                  inc  grp 
----------- ----- --------------------- ---- ---
aaaa 1 2015-10-24T22:52:47Z 0 0
bbbb 1 2015-10-23T22:56:47Z 0 0
cccc 2 2015-10-22T21:52:47Z 1 1
dddd 2 2015-10-20T22:12:47Z 0 1
aaaa 1 2015-10-19T20:52:47Z 1 2
dddd 2 2015-10-18T12:52:47Z 1 3
aaaa 3 2015-10-18T12:52:47Z 1 4

See example at SQL Fiddle.

Deleting consecutive rows in a pandas dataframe with the same value

first get a group each time a new value exists, then use GroupBy.head

new_df = df.groupby(df['rating'].ne(df['rating'].shift()).cumsum()).head(2)
print(new_df)

rating
0 4.0
1 4.0
2 3.5
3 15.0
4 5.0
5 4.0
6 4.0

Group consecutive rows based on one column

This is a gaps-and-islands problem. Use the difference of row_number():

select injourney, min(timestamp), max(timestamp)
from (select t.*,
row_number() over (order by timestamp) as seqnum,
row_number() over (partition by injourney, order by timestamp) as seqnum_i
from t
) t
group by injourney, (seqnum - seqnum_i)
order by min(timestamp);


Related Topics



Leave a reply



Submit