How to Find Consecutive Rows Based on the Value of a Column

How to find consecutive rows based on the value of a column?

Try this

WITH cte
AS
(
    SELECT *,COUNT(1) OVER(PARTITION BY cnt) pt  FROM
    (
        SELECT tt.*
           ,(SELECT COUNT(id) FROM t WHERE data <= 10 AND ID < tt.ID) AS cnt
        FROM  t tt
        WHERE data > 10
    ) t1
)

SELECT id, [when], data FROM cte WHERE pt >= 3

SQL FIDDLE DEMO

OUTPUT

id  when                    data
2   2013-08-02 00:00:00.000 121
3   2013-08-03 00:00:00.000 132
4   2013-08-04 00:00:00.000 15
6   2013-08-06 00:00:00.000 1435
7   2013-08-07 00:00:00.000 143
8   2013-08-08 00:00:00.000 18
9   2013-08-09 00:00:00.000 19

EDIT

First the inner query counts the no of records where data <= 10

SELECT tt.*
     ,(SELECT COUNT(id) FROM t WHERE data <= 10 AND ID < tt.ID) AS cnt
FROM  t tt

output

id  when                    data   cnt
1   2013-08-01 00:00:00.000 1       1
2   2013-08-02 00:00:00.000 121     1
3   2013-08-03 00:00:00.000 132     1
4   2013-08-04 00:00:00.000 15      1
5   2013-08-05 00:00:00.000 9       2
6   2013-08-06 00:00:00.000 1435    2
7   2013-08-07 00:00:00.000 143     2
8   2013-08-08 00:00:00.000 18      2
9   2013-08-09 00:00:00.000 19      2
10  2013-08-10 00:00:00.000 1       3
11  2013-08-11 00:00:00.000 1234    3
12  2013-08-12 00:00:00.000 124     3
13  2013-08-13 00:00:00.000 6       4

Then we filter the records with data > 10

WHERE data > 10

Now we count the records by partitoning cnt column

SELECT *,COUNT(1) OVER(PARTITION BY cnt) pt  FROM
(
    SELECT tt.*
        ,(SELECT COUNT(id) FROM t WHERE data <= 10 AND ID < tt.ID) AS cnt
    FROM  t tt
    WHERE data > 10
) t1

Output

id  when    data                   cnt  pt
2   2013-08-02 00:00:00.000 121     1   3
3   2013-08-03 00:00:00.000 132     1   3
4   2013-08-04 00:00:00.000 15      1   3
6   2013-08-06 00:00:00.000 1435    2   4
7   2013-08-07 00:00:00.000 143     2   4
8   2013-08-08 00:00:00.000 18      2   4
9   2013-08-09 00:00:00.000 19      2   4
11  2013-08-11 00:00:00.000 1234    3   2
12  2013-08-12 00:00:00.000 124     3   2

The above query is put in cte just like temp table

Now select the records that are having the consecutive count >= 3

SELECT id, [when], data FROM cte WHERE pt >= 3

ANOTHER SOLUTION

;WITH partitioned AS (
  SELECT *, id - ROW_NUMBER() OVER (ORDER BY id) AS grp
  FROM t
  WHERE data > 10
),
counted AS (
  SELECT *, COUNT(*) OVER (PARTITION BY grp) AS cnt
  FROM partitioned
)

SELECT id, [when], data
FROM counted
WHERE cnt >= 3

Reference URL

SQL FIDDLE DEMO

Count consecutive rows for each customer and value

For gaps and islands solution, for first row_number you need to partition by customer.

SELECT customer,    status, COUNT(*) FROM (
select t.*,
             (row_number() over (partition by customer order by id) -
              row_number() over (partition by customer, status order by id)
             ) as grp
      from tickets t
) X
GROUP BY customer,  status, grp
ORDER BY customer,  max(id)

dbfiddle

Result:

customer    status  count
--------    ------  -----
A           0       4
A           1       3
B           0       2
B           1       1
B           0       2
C           0       1

Get the rows of dataframe based on the consecutive values of one column

Not the pythonic way, but doing the work:

keep = []
for i in range(len(df) - 2):
    if (df.View[i]=='a') & (df.View[i+1] =='p') & (df.View[i+2] =='p'):
        keep.append(df[i])
        keep.append(df[i+1])
        keep.append(df[i+2])

Result:

Sample Image

How can I find 5 consecutive rows in pandas Dataframe where a value of a certain column is at least 0.5

Here is a pandas, non-iterative approach, and therefore quite efficient.

Steps:

Create a rolling window of 5 points and determine the minimum value.
If the minimum value is >= 0.5, store True, else store False.
All booleans are stored in a numpy.array, called idx.
The idx array is used as a filter on the main dataset with a value of 4 subtracted to determine the first index of the run of 5.
The filtered DataFrame is presented.

Sample code:

idx = (df['residual'].rolling(window=5).min() >= 0.5).to_numpy()
df.iloc[df.index[idx]-4]

Output:

Index                       Time  real_generation  predicted_generation  residual
    1  2019-01-01 11:00:00+00:00            0.126                 0.627     0.501

Find 5 consecutive row values in Pandas Dataframe that are equal

One classical way is to use boolean indexing with a custom mask. Breaking it down, it relies on making groups of consecutive Match values, and counting the group size to slice the matching rows.

m = df.groupby(df['Col3'].ne('Match').cumsum())['Col3'].transform('size').ge(5)
df[m&m.shift()]

Alternatively:

m = df['Col3'].ne('Match')
m2 = df.groupby((m|m.shift()).cumsum())['Col3'].transform('size').ge(5)
df[m2]

output:

    Col1 Col2   Col3
2  Time3    c  Match
3  Time4    d  Match
4  Time5    e  Match
5  Time6    f  Match
6  Time7    g  Match

Generate Identifier for consecutive rows with same value

This is a Gaps & Islands problem that is solved using the traditional solution.

For example:

select
  *,
  sum(inc) over(order by date desc, type) as grp
from (
  select *,
    case when type <> lag(type) over(order by date desc, type)
      then 1 else 0 end as inc
  from test
) x
order by date desc, type

Result:

 CustomerId  Type  date                  inc  grp 
 ----------- ----- --------------------- ---- --- 
 aaaa        1     2015-10-24T22:52:47Z  0    0   
 bbbb        1     2015-10-23T22:56:47Z  0    0   
 cccc        2     2015-10-22T21:52:47Z  1    1   
 dddd        2     2015-10-20T22:12:47Z  0    1   
 aaaa        1     2015-10-19T20:52:47Z  1    2   
 dddd        2     2015-10-18T12:52:47Z  1    3   
 aaaa        3     2015-10-18T12:52:47Z  1    4

See example at SQL Fiddle.

Deleting consecutive rows in a pandas dataframe with the same value

first get a group each time a new value exists, then use GroupBy.head

new_df = df.groupby(df['rating'].ne(df['rating'].shift()).cumsum()).head(2)
print(new_df)

   rating
0     4.0
1     4.0
2     3.5
3    15.0
4     5.0
5     4.0
6     4.0

Group consecutive rows based on one column

This is a gaps-and-islands problem. Use the difference of row_number():

select injourney, min(timestamp), max(timestamp)
from (select t.*,
             row_number() over (order by timestamp) as seqnum,
             row_number() over (partition by injourney, order by timestamp) as seqnum_i
      from t
     ) t
group by injourney, (seqnum - seqnum_i)
order by min(timestamp);

How to Find Consecutive Rows Based on the Value of a Column