How to Find Gaps in Sequential Numbering in MySQL

How to find gaps in sequential numbering in mysql?

Update

ConfexianMJS provided much better answer in terms of performance.

The (not as fast as possible) answer

Here's version that works on table of any size (not just on 100 rows):

SELECT (t1.id + 1) as gap_starts_at, 
(SELECT MIN(t3.id) -1 FROM arrc_vouchers t3 WHERE t3.id > t1.id) as gap_ends_at
FROM arrc_vouchers t1
WHERE NOT EXISTS (SELECT t2.id FROM arrc_vouchers t2 WHERE t2.id = t1.id + 1)
HAVING gap_ends_at IS NOT NULL
  • gap_starts_at - first id in current gap
  • gap_ends_at - last id in current gap

How to find gaps in sequential numbering in HSQLDB?

The problem is the use of the variable @rownum. This is not supported by HSQLDB.

With HSQLDB you can do it in a simple manner.

Suppose the table is called CUSTOMER and the sequence column is called ID. The queries below show how SEQUENCE_ARRAY works and is used for finding the missing values.

-- this returns consecutive numbers within a fixed range
SELECT * FROM UNNEST (SEQUENCE_ARRAY(1, 1000, 1))
-- this returns all the possible consecutive numbers for an existing table
SELECT * FROM UNNEST (SEQUENCE_ARRAY((SELECT MIN(ID) FROM CUSTOMER), (SELECT MAX(ID) FROM CUSTOMER), 1))

-- this returns the list of unused IDs.
SELECT * FROM UNNEST (SEQUENCE_ARRAY((SELECT MIN(ID) FROM CUSTOMER), (SELECT MAX(ID) FROM CUSTOMER), 1)) SEQ(IDCOL)
LEFT OUTER JOIN CUSTOMER ON CUSTOMER.ID = SEQ.IDCOL WHERE CUSTOMER.ID IS NULL

mysql check for gaps in numeric sequence

Outputs a list of missing ranges as in the link provided, but within the specified range (not extensively tested).

You'll need to iterator through them to get the actual values.

CREATE TABLE tempTable AS ...

DECLARE @StartID INT ...
DECLARE @EndID INT ...

SELECT @StartID as gap_starts_at,
COALESCE((SELECT MIN(t3.id) -1 FROM tempTable t3
WHERE t3.id > @StartID AND t3.id < @EndID), @EndID) as gap_ends_at
FROM tempTable t1
WHERE NOT EXISTS (SELECT t2.id FROM tempTable t2 WHERE t2.id = @StartID)
UNION
SELECT (t1.id + 1) as gap_starts_at,
COALESCE((SELECT MIN(t3.id) -1 FROM tempTable t3 WHERE t3.id > t1.id),
@EndID) as gap_ends_at
FROM #tempTable t1
WHERE NOT EXISTS (SELECT t2.id FROM tempTable t2 WHERE t2.id = t1.id + 1)
AND id < @EndID

EDIT: Here's a link with a few ways to find missing values (I don't think any of them work with ranges though, but some may be easier to extend then others.

Next group of sequential numbers mysql

create table nums
(
num int not null
);

-- truncate table nums;
insert nums (num) values (1),(2),(14),(15),(16),(17),(20),(21),(22),(23),(24),(30),(81),(120),(121),(122),(123),(124);

select min(t2.num)
from
(
select t1.num
from nums t1
where 5 in (select count(*) from nums where num in (t1.num,t1.num+1,t1.num+2,t1.num+3,t1.num+4))
) t2;

Answer:
20

MySQL finding gaps in column with multiple ID

You can do this with not exists:

select s.*
from sequence s
where not exists (select 1 from sequence s2 where s2.id = s.id and s2.value = s.value + 1) and
exists (select 1 from sequence s2 where s2.id = s.id and s2.value > s.value);

The exists clause is important so you don't report the final value for each id.

EDIT:

Here is a better approach:

select s.value + 1 as startgap,
(select min(s2.value) - 1 from sequence s2 where s2.id = s.id and s2.value > s.value) as endgap
from sequence s
where not exists (select 1 from sequence s2 where s2.id = s.id and s2.value = s.value + 1) and
exists (select 1 from sequence s2 where s2.id = s.id and s2.value > s.value);

How do I find a gap in running counter with SQL?

In MySQL and PostgreSQL:

SELECT  id + 1
FROM mytable mo
WHERE NOT EXISTS
(
SELECT NULL
FROM mytable mi
WHERE mi.id = mo.id + 1
)
ORDER BY
id
LIMIT 1

In SQL Server:

SELECT  TOP 1
id + 1
FROM mytable mo
WHERE NOT EXISTS
(
SELECT NULL
FROM mytable mi
WHERE mi.id = mo.id + 1
)
ORDER BY
id

In Oracle:

SELECT  *
FROM (
SELECT id + 1 AS gap
FROM mytable mo
WHERE NOT EXISTS
(
SELECT NULL
FROM mytable mi
WHERE mi.id = mo.id + 1
)
ORDER BY
id
)
WHERE rownum = 1

ANSI (works everywhere, least efficient):

SELECT  MIN(id) + 1
FROM mytable mo
WHERE NOT EXISTS
(
SELECT NULL
FROM mytable mi
WHERE mi.id = mo.id + 1
)

Systems supporting sliding window functions:

SELECT  -- TOP 1
-- Uncomment above for SQL Server 2012+
previd
FROM (
SELECT id,
LAG(id) OVER (ORDER BY id) previd
FROM mytable
) q
WHERE previd <> id - 1
ORDER BY
id
-- LIMIT 1
-- Uncomment above for PostgreSQL

Sequential occurrence (advanced gaps and islands problem)

Here is one solution (SQL Server).

DECLARE @max_in_row TABLE(
hit_finish_dttm VARCHAR(255),
hid VARCHAR(255),
agent_login VARCHAR(255),
flg_no_talk int
);
INSERT INTO @max_in_row(hit_finish_dttm, hid, agent_login, flg_no_talk)
VALUES('2020-03-01', 'EQERR13', 'Dmitrii', 0),
('2020-03-02', 'EQERR13', 'Dmitrii', 1),
('2020-03-03', 'EQERR13', 'Dmitrii', 1),
('2020-03-01', 'RR13EQE', 'Dmitrii', 0),
('2020-03-02', 'RR13EQE', 'Dmitrii', 1),
('2020-03-03', 'RR13EQE', 'Dmitrii', 0),
('2020-03-04', 'RR13EQE', 'Dmitrii', 0),
('2020-03-05', 'RR13EQE', 'Dmitrii', 1),
('2020-03-06', 'RR13EQE', 'Dmitrii', 1),
('2020-03-07', 'RR13EQE', 'Dmitrii', 0),
('2020-03-01', 'EQERR13', 'Alex', 1),
('2020-03-02', 'EQERR13', 'Alex', 1),
('2020-03-03', 'EQERR13', 'Alex', 0),
('2020-03-04', 'EQERR13', 'Alex', 1),
('2020-03-05', 'EQERR13', 'Alex', 1),
('2020-03-06', 'EQERR13', 'Alex', 1),
('2020-03-02', 'RR13EQE', 'Alex', 1),
('2020-03-03', 'RR13EQE', 'Alex', 0),
('2020-03-04', 'RR13EQE', 'Alex', 1)
;
WITH OrderNormalized AS
(
--Since the 0 and 1 can come out of sequence in the data, build up clusters of distinct groups with a chronological order flag
--to use as a virtual grouped timetable
SELECT *,
GroupNumber = DENSE_RANK() OVER(ORDER BY hid, agent_login ),
OrderInGroup = RANK() OVER(PARTITION BY hid, agent_login ORDER BY hit_finish_dttm)
FROM
@max_in_row
)
,GapsMarked AS
(
--Order the Gaps so they can be joined with connected islands
--This is needed because the value can go from 0 to 1 multiple times per partition.
--That condition needs to be accounted for to reset the count.
SELECT *,
NoTalkGroupNumber = RANK() OVER(PARTITION BY GroupNumber ORDER BY OrderInGroup)
FROM
OrderNormalized
WHERE
flg_no_talk = 0

)
,IslandsGrouped AS
(
--Data is partitioned and the gaps serialized above. Now join the islands with the closest
--gap looking backwards and take the min NOTE: There is a cleaner solution here, I just don't have the time to think it up right now
SELECT
D.*,
NoTalkGroupNumber=CASE WHEN MIN(G.NoTalkGroupNumber) IS NULL THEN 0 ELSE MIN(G.NoTalkGroupNumber) END
FROM
OrderNormalized D
LEFT JOIN GapsMarked G ON G.GroupNumber = D.GroupNumber AND G.OrderInGroup > D.OrderInGroup
GROUP BY
D.agent_login,D.flg_no_talk,D.GroupNumber,D.hid,D.hit_finish_dttm,D.OrderInGroup
)
,SeralizedItemsInIslandGroups AS
(
SELECT
*,
--This serializes by summing sequential flg_no_talk within each respective islands
ItemOrder = SUM(flg_no_talk) OVER (PARTITION BY GroupNumber,NoTalkGroupNumber ORDER BY OrderInGroup ROWS UNBOUNDED PRECEDING)
FROM
IslandsGrouped
)

SELECT
agent_login, hid, MAX(ItemOrder) FROM SeralizedItemsInIslandGroups
GROUP BY
agent_login, hid

And Here is a PostgreSQL Fiddle->

SQL Fiddle

PostgreSQL 9.6 Schema Setup:

    CREATE TABLE max_in_row (
hit_finish_dttm VARCHAR(255),
hid VARCHAR(255),
agent_login VARCHAR(255),
flg_no_talk int
);

Query 1:

INSERT INTO max_in_row(hit_finish_dttm, hid, agent_login, flg_no_talk)
VALUES('2020-03-01', 'EQERR13', 'Dmitrii', 0),
('2020-03-02', 'EQERR13', 'Dmitrii', 1),
('2020-03-03', 'EQERR13', 'Dmitrii', 1),
('2020-03-01', 'RR13EQE', 'Dmitrii', 0),
('2020-03-02', 'RR13EQE', 'Dmitrii', 1),
('2020-03-03', 'RR13EQE', 'Dmitrii', 0),
('2020-03-04', 'RR13EQE', 'Dmitrii', 0),
('2020-03-05', 'RR13EQE', 'Dmitrii', 1),
('2020-03-06', 'RR13EQE', 'Dmitrii', 1),
('2020-03-07', 'RR13EQE', 'Dmitrii', 0),
('2020-03-01', 'EQERR13', 'Alex', 1),
('2020-03-02', 'EQERR13', 'Alex', 1),
('2020-03-03', 'EQERR13', 'Alex', 0),
('2020-03-04', 'EQERR13', 'Alex', 1),
('2020-03-05', 'EQERR13', 'Alex', 1),
('2020-03-06', 'EQERR13', 'Alex', 1),
('2020-03-02', 'RR13EQE', 'Alex', 1),
('2020-03-03', 'RR13EQE', 'Alex', 0),
('2020-03-04', 'RR13EQE', 'Alex', 1)

Results:

Query 2:

WITH OrderNormalized AS
(
SELECT *,
DENSE_RANK() OVER(ORDER BY hid, agent_login ) GroupNumber,
RANK() OVER(PARTITION BY hid, agent_login ORDER BY hit_finish_dttm) OrderInGroup
FROM
max_in_row
)
,GapsMarked AS
(
SELECT *,
RANK() OVER(PARTITION BY GroupNumber ORDER BY OrderInGroup) NoTalkGroupNumber
FROM
OrderNormalized
WHERE
flg_no_talk = 0

)
,IslandsGrouped AS
(
SELECT
D.*,
CASE WHEN MIN(G.NoTalkGroupNumber) IS NULL THEN 0 ELSE MIN(G.NoTalkGroupNumber) END NoTalkGroupNumber
FROM
OrderNormalized D
LEFT JOIN GapsMarked G ON G.GroupNumber = D.GroupNumber AND G.OrderInGroup > D.OrderInGroup
GROUP BY
D.agent_login,D.flg_no_talk,D.GroupNumber,D.hid,D.hit_finish_dttm,D.OrderInGroup
)
,SeralizedItemsInIslandGroups AS
(
SELECT
*,
SUM(flg_no_talk) OVER (PARTITION BY GroupNumber,NoTalkGroupNumber ORDER BY OrderInGroup ROWS UNBOUNDED PRECEDING) ItemOrder FROM
IslandsGrouped
)

SELECT
agent_login, hid, MAX(ItemOrder) FROM SeralizedItemsInIslandGroups
GROUP BY
agent_login, hid

Results:

| agent_login |     hid | max |
|-------------|---------|-----|
| Dmitrii | RR13EQE | 2 |
| Alex | RR13EQE | 1 |
| Alex | EQERR13 | 3 |
| Dmitrii | EQERR13 | 2 |

How to find a ranges of sequential numbers without gaps in a table

You need to identify groups that are the same. There is a trick to this, which is a difference of row numbers.

select min(id) as fromid, max(id) as toid, type
from (select t.*,
(row_number() over (partition by type order by id) -
row_number() over (partition by type, badvalue order by id)
) as grp
from table t
) grp
where badvalue = 0
group by grp, type;

There is a nuance here, because you only seem to want rows where "bad value" is 0. Note that this condition goes in the outer select, so it doesn't interfere with the row_number() calculations.



Related Topics



Leave a reply



Submit