How to find consecutive rows based on the value of a column?
Try this
WITH cte
AS
(
SELECT *,COUNT(1) OVER(PARTITION BY cnt) pt FROM
(
SELECT tt.*
,(SELECT COUNT(id) FROM t WHERE data <= 10 AND ID < tt.ID) AS cnt
FROM t tt
WHERE data > 10
) t1
)
SELECT id, [when], data FROM cte WHERE pt >= 3
SQL FIDDLE DEMO
OUTPUT
id when data
2 2013-08-02 00:00:00.000 121
3 2013-08-03 00:00:00.000 132
4 2013-08-04 00:00:00.000 15
6 2013-08-06 00:00:00.000 1435
7 2013-08-07 00:00:00.000 143
8 2013-08-08 00:00:00.000 18
9 2013-08-09 00:00:00.000 19
EDIT
First the inner query counts the no of records where data <= 10
SELECT tt.*
,(SELECT COUNT(id) FROM t WHERE data <= 10 AND ID < tt.ID) AS cnt
FROM t tt
output
id when data cnt
1 2013-08-01 00:00:00.000 1 1
2 2013-08-02 00:00:00.000 121 1
3 2013-08-03 00:00:00.000 132 1
4 2013-08-04 00:00:00.000 15 1
5 2013-08-05 00:00:00.000 9 2
6 2013-08-06 00:00:00.000 1435 2
7 2013-08-07 00:00:00.000 143 2
8 2013-08-08 00:00:00.000 18 2
9 2013-08-09 00:00:00.000 19 2
10 2013-08-10 00:00:00.000 1 3
11 2013-08-11 00:00:00.000 1234 3
12 2013-08-12 00:00:00.000 124 3
13 2013-08-13 00:00:00.000 6 4
Then we filter the records with data > 10
WHERE data > 10
Now we count the records by partitoning cnt column
SELECT *,COUNT(1) OVER(PARTITION BY cnt) pt FROM
(
SELECT tt.*
,(SELECT COUNT(id) FROM t WHERE data <= 10 AND ID < tt.ID) AS cnt
FROM t tt
WHERE data > 10
) t1
Output
id when data cnt pt
2 2013-08-02 00:00:00.000 121 1 3
3 2013-08-03 00:00:00.000 132 1 3
4 2013-08-04 00:00:00.000 15 1 3
6 2013-08-06 00:00:00.000 1435 2 4
7 2013-08-07 00:00:00.000 143 2 4
8 2013-08-08 00:00:00.000 18 2 4
9 2013-08-09 00:00:00.000 19 2 4
11 2013-08-11 00:00:00.000 1234 3 2
12 2013-08-12 00:00:00.000 124 3 2
The above query is put in cte just like temp table
Now select the records that are having the consecutive count >= 3
SELECT id, [when], data FROM cte WHERE pt >= 3
ANOTHER SOLUTION
;WITH partitioned AS (
SELECT *, id - ROW_NUMBER() OVER (ORDER BY id) AS grp
FROM t
WHERE data > 10
),
counted AS (
SELECT *, COUNT(*) OVER (PARTITION BY grp) AS cnt
FROM partitioned
)
SELECT id, [when], data
FROM counted
WHERE cnt >= 3
Reference URL
SQL FIDDLE DEMO
Count consecutive rows for each customer and value
For gaps and islands solution, for first row_number you need to partition by customer.
SELECT customer, status, COUNT(*) FROM (
select t.*,
(row_number() over (partition by customer order by id) -
row_number() over (partition by customer, status order by id)
) as grp
from tickets t
) X
GROUP BY customer, status, grp
ORDER BY customer, max(id)
dbfiddle
Result:
customer status count
-------- ------ -----
A 0 4
A 1 3
B 0 2
B 1 1
B 0 2
C 0 1
Get the rows of dataframe based on the consecutive values of one column
Not the pythonic way, but doing the work:
keep = []
for i in range(len(df) - 2):
if (df.View[i]=='a') & (df.View[i+1] =='p') & (df.View[i+2] =='p'):
keep.append(df[i])
keep.append(df[i+1])
keep.append(df[i+2])
Result:
How can I find 5 consecutive rows in pandas Dataframe where a value of a certain column is at least 0.5
Here is a pandas
, non-iterative approach, and therefore quite efficient.
Steps:
- Create a rolling window of 5 points and determine the minimum value.
- If the minimum value is >= 0.5, store
True
, else storeFalse
. - All booleans are stored in a
numpy.array
, calledidx
. - The
idx
array is used as a filter on the main dataset with a value of 4 subtracted to determine the first index of the run of 5. - The filtered DataFrame is presented.
Sample code:
idx = (df['residual'].rolling(window=5).min() >= 0.5).to_numpy()
df.iloc[df.index[idx]-4]
Output:
Index Time real_generation predicted_generation residual
1 2019-01-01 11:00:00+00:00 0.126 0.627 0.501
Find 5 consecutive row values in Pandas Dataframe that are equal
One classical way is to use boolean indexing with a custom mask. Breaking it down, it relies on making groups of consecutive Match values, and counting the group size to slice the matching rows.
m = df.groupby(df['Col3'].ne('Match').cumsum())['Col3'].transform('size').ge(5)
df[m&m.shift()]
Alternatively:
m = df['Col3'].ne('Match')
m2 = df.groupby((m|m.shift()).cumsum())['Col3'].transform('size').ge(5)
df[m2]
output:
Col1 Col2 Col3
2 Time3 c Match
3 Time4 d Match
4 Time5 e Match
5 Time6 f Match
6 Time7 g Match
Generate Identifier for consecutive rows with same value
This is a Gaps & Islands problem that is solved using the traditional solution.
For example:
select
*,
sum(inc) over(order by date desc, type) as grp
from (
select *,
case when type <> lag(type) over(order by date desc, type)
then 1 else 0 end as inc
from test
) x
order by date desc, type
Result:
CustomerId Type date inc grp
----------- ----- --------------------- ---- ---
aaaa 1 2015-10-24T22:52:47Z 0 0
bbbb 1 2015-10-23T22:56:47Z 0 0
cccc 2 2015-10-22T21:52:47Z 1 1
dddd 2 2015-10-20T22:12:47Z 0 1
aaaa 1 2015-10-19T20:52:47Z 1 2
dddd 2 2015-10-18T12:52:47Z 1 3
aaaa 3 2015-10-18T12:52:47Z 1 4
See example at SQL Fiddle.
Deleting consecutive rows in a pandas dataframe with the same value
first get a group each time a new value exists, then use GroupBy.head
new_df = df.groupby(df['rating'].ne(df['rating'].shift()).cumsum()).head(2)
print(new_df)
rating
0 4.0
1 4.0
2 3.5
3 15.0
4 5.0
5 4.0
6 4.0
Group consecutive rows based on one column
This is a gaps-and-islands problem. Use the difference of row_number()
:
select injourney, min(timestamp), max(timestamp)
from (select t.*,
row_number() over (order by timestamp) as seqnum,
row_number() over (partition by injourney, order by timestamp) as seqnum_i
from t
) t
group by injourney, (seqnum - seqnum_i)
order by min(timestamp);
Related Topics
How to Get N Rows Starting from Row M from Sorted Table in T-Sql
Boolean VS Tinyint(1) for Boolean Values in MySQL
Trim Trailing Spaces with Postgresql
Is of a Type That Is Invalid for Use as a Key Column in an Index
Composite Primary Keys:Good or Bad
Can an Inner Join Offer Better Performance Than Exists
How to Get Return Value of a Stored Procedure
Count the Null Columns in a Row in SQL
Sorting String Column Containing Numbers in SQL
Can You Have If-Then-Else Logic in SQL
SQL Query for 7 Day Rolling Average in SQL Server
Mysql, Reshape Data from Long/Tall to Wide
Why Execute Stored Procedures Is Faster Than SQL Query from a Script
T-SQL and the Where Like %Parameter% Clause
Get Most Common Value for Each Value of Another Column in SQL