Count Consecutive Duplicate Values in SQL

Count consecutive duplicate values in SQL

I am going to presume that id is unique and increasing. You can get counts of consecutive values by using the different of row numbers. The following counts all sequences:

select grp, value, min(id), max(id), count(*) as cnt
from (select t.*,
(row_number() over (order by id) - row_number() over (partition by value order by id)
) as grp
from table t
) t
group by grp, value;

If you want the longest sequence of 0s:

select top 1 grp, value, min(id), max(id), count(*) as cnt
from (select t.*,
(row_number() over (order by id) - row_number() over (partition by value order by id)
) as grp
from table t
) t
group by grp, value
having value = 0
order by count(*) desc

Counting consecutive duplicate field with SQL

This is a 'Gaps and Islands' problem, you can try like following.

;with cte 
as (select *,
dateadd(day, -row_number()
over (partition by status, personid
order by [date] ), [date]) AS grp
FROM @table
)
,cte1
AS (select *,row_number() over(partition by personid, grp,status order by [date]) rn,
count(*) over(partition by personid, grp) ct
from cte
)

select ct as count, personid
from cte1
where rn=1

Online Demo

Note: You might not get the rows in same sequence as you don't have any column which can be used for ordering the way you showed in the desired output.

Count consecutive duplicate values by group

This is a gap-and-islands problem. One method is the difference of row_number()s to identify the groups.

select t.*,
dense_rank() over (partition by id order by (seqnum - seqnum_value), value) as grp,
row_number() over (partition by id, (seqnum - seqnum_value), value order by date) as grp_seqnum
from (select t.*,
row_number() over (partition by id order by date) as seqnum,
row_number() over (partition by id, value order by date) as seqnum_v
from t
) t;

This is a bit tricky to understand the first time you see it. If you run the subquery and stare at the results long enough, you'll get why the difference is constant for adjacent values.

EDIT:

I think Jorge is right. Your data doesn't have the same value repeated, so you can just do:

select t.*,
row_number() over (partition by id, value order by date) as grp_seqnum
from t;

Counting consecutive duplicate records with SQL

(Edited after comment)

You can do that by assigning a "head" number to each group of consecutive values. After that you select the head number for each row, and do an aggregate per head.

Here's an example, with CTE's for readability:

WITH
OrderedTable as (
select value, rownr = row_number() over (order by userid, id)
from YourTable
where userid = 2287
),
Heads as (
select cur.rownr, CurValue = cur.value
, headnr = row_number() over (order by cur.rownr)
from OrderedTable cur
left join OrderedTable prev on cur.rownr = prev.rownr+1
where IsNull(prev.value,-1) != cur.value
),
ValuesWithHead as (
select value
, HeadNr = (select max(headnr)
from Heads
where Heads.rownr <= data.rownr)
from OrderedTable data
)
select Value, [Count] = count(*)
from ValuesWithHead
group by HeadNr, value
order by count(*) desc

This will output:

Value   Count
2 4
3 3
1 2
2 1
2 1
7 1

Use "top 1" to select the first row only.

Here's my query to create the test data:

create table YourTable (
id int primary key,
userid int,
variable varchar(25),
value int
)
insert into YourTable (id, userid, variable, value) values (3115, 2287, 'votech05', 2)
insert into YourTable (id, userid, variable, value) values (3116, 2287, 'comcol05', 1)
insert into YourTable (id, userid, variable, value) values (3117, 2287, 'fouryr05', 1)
insert into YourTable (id, userid, variable, value) values (3118, 2287, 'none05', 2)
insert into YourTable (id, userid, variable, value) values (3119, 2287, 'ocol1_05', 2)
insert into YourTable (id, userid, variable, value) values (3120, 2287, 'disnone', 2)
insert into YourTable (id, userid, variable, value) values (3121, 2287, 'dissense', 2)
insert into YourTable (id, userid, variable, value) values (3122, 2287, 'dismobil', 3)
insert into YourTable (id, userid, variable, value) values (3123, 2287, 'dislearn', 3)
insert into YourTable (id, userid, variable, value) values (3124, 2287, 'disment', 3)
insert into YourTable (id, userid, variable, value) values (3125, 2287, 'disother', 2)
insert into YourTable (id, userid, variable, value) values (3126, 2287, 'disrefus', 7)

Count length of consecutive duplicate values for each id

First of all, we need to have a way to defined how the rows are ordered. For example, in your sample data there is not way to be sure that 'first' row (1, 1) will be always displayed before the 'second' row (1,0).

That's why in my sample data I have added an identity column. In your real case, the details can be order by row ID, date column or something else, but you need to ensure the rows can be sorted via unique criteria.

So, the task is pretty simple:

  1. calculate trigger switch - when value is changed
  2. calculate groups
  3. calculate rows

That's it. I have used common table expression and leave all columns in order to be easy for you to understand the logic. You are free to break this in separate statements and remove some of the columns.

DECLARE @DataSource TABLE
(
[RowID] INT IDENTITY(1, 1)
,[ID]INT
,[value] INT
);

INSERT INTO @DataSource ([ID], [value])
VALUES (1, 1)
,(1, 0)
,(1, 0)
,(1, 1)
,(1, 1)
,(1, 1)
--
,(2, 0)
,(2, 1)
,(2, 0)
,(2, 0);

WITH DataSourceWithSwitch AS
(
SELECT *
,IIF(LAG([value]) OVER (PARTITION BY [ID] ORDER BY [RowID]) = [value], 0, 1) AS [Switch]
FROM @DataSource
), DataSourceWithGroup AS
(
SELECT *
,SUM([Switch]) OVER (PARTITION BY [ID] ORDER BY [RowID]) AS [Group]
FROM DataSourceWithSwitch
)
SELECT *
,ROW_NUMBER() OVER (PARTITION BY [ID], [Group] ORDER BY [RowID]) AS [GroupRowID]
FROM DataSourceWithGroup
ORDER BY [RowID];

Sample Image

Count consecutive duplicates in a column

Use rle

x = c(0L, 1L, 0L, -1L, -1L, 0L, 1L, -1L, -1L, -1L, 1L)    
with(rle(x), lengths[values == -1])
#[1] 2 3

For all unique elements of x

with(rle(x), setNames(sapply(unique(values), function(x)
lengths[values == x]), nm = unique(values)))
#$`0`
#[1] 1 1 1

#$`1`
#[1] 1 1 1

#$`-1`
#[1] 2 3

Count consecutive rows for each customer and value

For gaps and islands solution, for first row_number you need to partition by customer.

SELECT customer,    status, COUNT(*) FROM (
select t.*,
(row_number() over (partition by customer order by id) -
row_number() over (partition by customer, status order by id)
) as grp
from tickets t
) X
GROUP BY customer, status, grp
ORDER BY customer, max(id)

dbfiddle

Result:

customer    status  count
-------- ------ -----
A 0 4
A 1 3
B 0 2
B 1 1
B 0 2
C 0 1


Related Topics



Leave a reply



Submit