Connect by Clause in Regex_Substr

connect by clause in regex_substr

The "abuse" (as Colin 't Hart put it) of connected by has a good purpose here:
by using REGEXP_SUBSTR you can extract only one of the 4 matches (23,34,45,56): the regex [^,]+ matches any character sequence in the string which does not contain a comma.

If you'll try running:

SELECT REGEXP_SUBSTR ('23,34,45,56','[^,]+') as "token" 
FROM DUAL

you'll get 23.

and if you'll try running:

SELECT REGEXP_SUBSTR ('23,34,45,56','[^,]+',1,1) as "token"
FROM DUAL

you'll also get 23 only that now we also set two additional parameters: start looking in position 1 (which is the default), and return the 1st occurrence.

Now lets run:

SELECT REGEXP_SUBSTR ('23,34,45,56','[^,]+',1,2) as "token"
FROM DUAL

this time we'll get 34 (2nd occurrence) and using 3 as the last parameter will return 45 and so on.

The use of recursive connected by along with level makes sure you'll receive all the relevant results (not necessarily in the original order though!):

SELECT DISTINCT REGEXP_SUBSTR ('23,34,45,56','[^,]+',1,LEVEL) as "token"
FROM DUAL
CONNECT BY REGEXP_SUBSTR ('23,34,45,56','[^,]+',1,LEVEL) IS NOT NULL
order by 1

will return:

TOKEN
23
34
45
56

which not only contains all 4 results, but also breaks it into separate rows in the resultset!

If you'll fiddle with it - it might give you a clearer view of the subject.

Oracle: Connect by Level & regexp_substr

It splits comma-separated values string into rows. For example (slightly modified so that it works in SQL*Plus):

SQL> with tableName AS
2 ( SELECT regexp_substr(fieldName,'[^,]+',1,level) as tableName
3 FROM (SELECT '&tableName' as fieldName From DUAL)
4 CONNECT BY LEVEL <= REGEXP_COUNT(fieldName ,'[,]')+1 )
5 select * From tablename;
Enter value for tablename: little,foot,overflow

TABLENAME
----------------------------------------------------------------------
little
foot
overflow

SQL>

A simpler/shorter version:

SQL> select level, regexp_substr('&&fieldName','[^,]+',1,level) as tableName
2 from dual
3 connect by LEVEL <= REGEXP_COUNT('&&fieldName' ,'[,]') + 1;
Enter value for fieldname: emp,dept,bonus,salgrade

LEVEL TABLENAME
---------- --------------------
1 emp
2 dept
3 bonus
4 salgrade

SQL>

So, what does it do? REGEXP_COUNT counts number of separators (comma in this case) and it is used in CONNECT BY which is related to hierarchical query whose LEVEL pseudocolumn (I included it into the output of the 2nd example) is used in REGEXP_SUBSTR as its 4th parameter, occurrence.

Connect by with regular expression. How does this query work?

The first portion simply groups the string into 4 sets and returns each set. By adding a connect by, the OP is saying I want each set returned as it's own row limiting the number of rows returned to be equal to the sets of data in the original.

Breaking it down:

  • '1,2,3' is the set we're playing with which contains subsets.
  • '[^,]+ says ignore the comma's in the sets use it as a deliniator.
  • ,1 says start at postion 1 in the substring
  • level says return this numbered
    set from the sub set if a 2 were entered it would return the 2nd set.

Now by adding connect by, the author is breaking each set into its own row matching the row where it is in the original set.

I'd look up how REGEXP_SUBSTR works.

REGEXP_SUBSTR extends the functionality of the SUBSTR function by letting you search a string for a regular expression pattern. It is also similar to REGEXP_INSTR, but instead of returning the position of the substring, it returns the substring itself. This function is useful if you need the contents of a match string but not its position in the source string. The function returns the string as VARCHAR2 or CLOB data in the same character set as source_char.

Then understand how connect by works

Oracle then uses the information from these evaluations to form the hierarchy using the following steps:

Oracle selects the root row(s) of the hierarchy--those rows that satisfy the START WITH condition.

Oracle selects the child rows of each root row. Each child row must satisfy the condition of the CONNECT BY condition with respect to one of the root rows.

Oracle selects successive generations of child rows. Oracle first selects the children of the rows returned in step 2, and then the children of those children, and so on. Oracle always selects children by evaluating the CONNECT BY condition with respect to a current parent row.

If the query contains a WHERE clause without a join, then Oracle eliminates all rows from the hierarchy that do not satisfy the condition of the WHERE clause. Oracle evaluates this condition for each row individually, rather than removing all the children of a row that does not satisfy the condition.

Oracle returns the rows in the order shown in Figure 9-1. In the diagram, children appear below their parents. For an explanation of hierarchical trees,

Sample Image

Long and short play around with it, you'll learn a lot more that way.

oracle SQL: improve connect by clause

Normalization issues not withstanding, your original query actually does not work. When run on your test data:

SQL> with Table_1(id, column1) as (
2 select 10, 'abc, kdm' from dual
3 union
4 select 20, 'xyz, lop, nkk' from dual
5 )
6 select id, regexp_substr(column1,'[^,]+', 1, level) from Table_1
7 connect by regexp_substr(column1, '[^,]+', 1, level) is not null;

ID REGEXP_SUBSTR
---------- -------------
10 abc
10 kdm
20 nkk
20 lop
20 nkk
20 xyz
10 kdm
20 nkk
20 lop
20 nkk

10 rows selected.

SQL>

So, when you add more values, the problem grows exponentially, thus your performance degrades. Test by adding a third row. Also, your delimiter is comma-space, not just a comma. And the regex form used to parse the list ('[^,]+') does not work with nulls and should be avoided. Unfortunately, it's the most common regex you'll see given as an answer for parsing a list.

Try this instead, it should handle the larger lists with ease:

SQL> with Table_1(id, column1) as (
select 10, 'abc, kdm' from dual
union
select 20, 'xyz, lop, nkk' from dual
)
SELECT id, -- column1,
-- COLUMN_VALUE AS match_nbr,
REGEXP_SUBSTR( column1 ,'(.*?)(, |$)', 1, COLUMN_VALUE, NULL, 1 ) AS match_value
FROM
Table_1,
TABLE(
CAST(
MULTISET(
SELECT LEVEL
FROM DUAL
CONNECT BY LEVEL <= REGEXP_COUNT(column1 , ',' )+1
) AS SYS.ODCINUMBERLIST
)
);

ID MATCH_VALUE
---------- -------------
10 abc
10 kdm
20 xyz
20 lop
20 nkk

SQL>

Basically this makes a table, one row per line, of the number of elements in the list (containing the element number) and joins it with the main table.

Uncomment the COLUMN_VALUE value line to see the numbering of the elements.
It assumes the data will not contain a comma.

Oracle SQL: select regexp_substr, into clause is expected

In PL/SQL you need to output the SQl query INTO a variable.

However, since this query is going to generate multiple rows you probably want to use BULK COLLECT INTO rather than just INTO and to put the output into a user-defined collection or a VARRAY (which SYS.ODCIVARCHAR2LIST is an example of. Note: you cannot use the MEMBER OF operator with a VARRAY):

DECLARE
list_of_improvs SYS.ODCIVARCHAR2LIST;
BEGIN
SELECT regexp_substr(improv,'[^,]+', 1, level)
BULK COLLECT INTO list_of_improvs
FROM DUAL
CONNECT BY regexp_substr(improv, '[^,]+', 1, level) is not null;
END;
/

Update:

In response to your comment - you can use it like this (although it is unclear what you are trying to achieve so I have just put your code into a snippet without trying to work out what you intend it to do):

DECLARE
list_of_improvs SYS.ODCIVARCHAR2LIST;
BEGIN
SELECT regexp_substr(improv,'[^,]+', 1, level)
BULK COLLECT INTO list_of_improvs
FROM DUAL
CONNECT BY regexp_substr(improv, '[^,]+', 1, level) is not null;

FOR i IN 1 .. list_of_improvs.COUNT LOOP
DBMS_OUTPUT.PUT_LINE( improvs(i) );
END LOOP;

update line_cap
set cap_up = list_of_improvs(1)
where id IN ( SELECT Column_Value
FROM TABLE( list_of_improvs ) );
END;
/

You can't use IN directly with a collection or VARRAY and need to use a TABLE() collection expression in a nested query to get the values out.

If you are using a user-defined SQL collection - i.e. defined with a statement like this:

CREATE TYPE StringList IS TABLE OF VARCHAR2(4000);

Then you can use the MEMBER OF operator:

DECLARE
list_of_improvs StringList;
BEGIN
-- as above

update line_cap
set cap_up = list_of_improvs(1)
where id MEMBER OF list_of_improvs;
END;
/

But you cannot use the MEMBER OF operator with VARRAYs (like SYS.ODCIVARCHAR2LIST).

However, you don't need PL/SQL for that (and eliminate costly context switches between the PL/SQL and SQL execution scopes) and could just use a MERGE statement something like:

MERGE INTO line_cap dst
USING (
SELECT MIN( value ) KEEP ( DENSE_RANK FIRST ORDER BY ROWNUM ) OVER () AS first_value,
value
FROM (
SELECT regexp_substr(improv,'[^,]+', 1, level) AS value
FROM DUAL
CONNECT BY regexp_substr(improv, '[^,]+', 1, level) is not null
)
) src
ON ( src.value = dst.id )
WHEN MATCHED THEN
UPDATE
SET cap_up = first_value;

How to select multiple columns from dual when I use regexp_substr with connect by clause and level

Your connect-by condition is only going to be true if the ID and name values match - which is not the case for your examples.

You can count the number of elements instead; and in case the counts are different, use greates() to pick the higher count:

SELECT REGEXP_SUBSTR ('id1:id2','[^:]+',1,LEVEL) R ,
REGEXP_SUBSTR ('name1:name2','[^:]+',1,LEVEL) D
FROM DUAL
CONNECT BY LEVEL <= GREATEST (REGEXP_COUNT ('id1:id2',':'),
REGEXP_COUNT ('name1:name2',':')) + 1
order by 1;

R D
------- -----------
id1 name1
id2 name2

Or a safer pattern if you could have empty elements:

SELECT REGEXP_SUBSTR ('id1:id2:id3', '(.*?)(:|$)', 1, LEVEL, NULL, 1) R ,
REGEXP_SUBSTR ('name1::name3', '(.*?)(:|$)', 1, LEVEL, NULL, 1) D
FROM DUAL
CONNECT BY LEVEL <= GREATEST (REGEXP_COUNT ('id1:id2:id3',':'),
REGEXP_COUNT ('name1::name3',':')) + 1
order by 1;

R D
----------- ------------
id1 name1
id2
id3 name3

Oracle: Trying to split a string using REGEXP_SUBSTR, CONNECT BY LEVEL functions and getting duplicates

You need to include the ID in the connect-by clause; but for this to wotk you also need to include a non-deterministic function call; e.g.:

AND PRIOR id = id
AND PRIOR dbms_random.value IS NOT NULL

So

with qry as (
select 1 as id, '<div>What would you like to do today?<a href="#" data-jump="111">Client Service</a><a href="#" data-jump="112">Customer Service</a><a href="#" data-jump="113">Tecnical Service</a></div>' as html_string from dual
union
select 2 as id, '<div>What would you like to do today?<a href="#" data-jump="111">Client Service</a><a href="#" data-jump="112">Customer Service</a><a href="#" data-jump="113">Tecnical Service</a></div><a href="#" data-jump="114">Other Service</a></div>' as html_string from dual
)
SELECT ID,
REGEXP_SUBSTR(html_string, '<a.*?>(.*?).</a>', 1, LEVEL, NULL, 1) as contents,
REGEXP_SUBSTR(html_string, 'data-jump="(.*?)"', 1, LEVEL, NULL, 1) as data_jump
FROM qry
CONNECT BY LEVEL <= REGEXP_COUNT(html_string, '<a.*?>(.*?).</a>')
AND PRIOR id = id
AND PRIOR dbms_random.value IS NOT NULL


Leave a reply



Submit