Oracle Sql: Extracting Text Between Two Characters

Oracle SQL: Extracting text between two characters

We can use REGEXP_SUBSTR here:

SELECT regexp_substr('{ABCDE}{F1}', '\{([^}]+)\}', 1,1,NULL,1) AS output
FROM dual

Demo

This is a less commonly seen use of REGEXP_SUBSTR, which uses a capture group, per this pattern:

\{([^}]+)\}

The sixth parameter says to return the first capture group.

Oracle Query to select only the text between two '-' characters

Since INSTR function allows to set what occurrence of any character you want, you can use this to identify the pieces to erase based on the "-" character:

select  trim(
replace(
replace(
description,
SUBSTR(description, 1, INSTR(description, '-',1,1)), -- <-- first occurrence of "-"
null),
SUBSTR(description, INSTR(description, '-',1,2), -- <-- second occurrence of "-"
length(description)),
null)
) as result
from all_cases;

Assuming that all your rows follow the same pattern, update all in one call would look something like this:

update all_cases
set description = trim(
replace(
replace(description,
SUBSTR(description, 1, INSTR(description, '-',1,1)),
null),
SUBSTR(description, INSTR(description, '-',1,2), length(description)),
null)
);

Which acts just identifying the left and right pieces based in the "-" character and replacing them with null.

Oracle SQL: Extracting multiple text between two characters

You can use the 4th argument of REGEXP_SUBSTR to specify an occurrence for matching.

To get a row for the 1st, 2nd, and 3rd occurrence, you can cross-join with a sub-query from dual.

WITH test_data AS (
SELECT 1 AS content_id, '<TITLE> <SUB-TITLE-1> Content<SUB-TITLE-2>Content.<A third sub-title>' AS content_data FROM dual UNION
SELECT 2 AS content_id, '<TITLE> <SUB-TITLE-1> Content<SUB-TITLE-2>Content.' AS content_data FROM dual
)

SELECT t.content_id,
REGEXP_SUBSTR(t.content_data, '<(.*?)>', 1, s.match_occurrence, 'i', 1) AS content_match
FROM test_data t
CROSS JOIN (
SELECT 1 AS match_occurrence FROM dual UNION
SELECT 2 AS match_occurrence FROM dual UNION
SELECT 3 AS match_occurrence FROM dual UNION
SELECT 4 AS match_occurrence FROM dual
/* ... etc, with the number of rows equal to the maximum number of matches that can appear */
) s
WHERE REGEXP_SUBSTR(t.content_data, '<.*?>', 1, s.match_occurrence) IS NOT NULL /* Only return records that have a match for the given occurrence */
ORDER BY t.content_id, s.match_occurrence

Borrowing the CONNECT_BY_LEVEL from Barbaros' excellent answer, you could do it more concisely as:

WITH test_data AS (
SELECT 1 AS content_id, '<TITLE> <SUB-TITLE-1> Content<SUB-TITLE-2>Content.<A third sub-title>' AS content_data FROM dual UNION
SELECT 2 AS content_id, '<TITLE> <SUB-TITLE-1> Content<SUB-TITLE-2>Content.' AS content_data FROM dual
)

SELECT t.content_id,
REGEXP_SUBSTR(t.content_data, '<(.*?)>', 1, LEVEL, 'i', 1) AS content_match
FROM test_data t
CONNECT BY
LEVEL <= REGEXP_COUNT(t.content_data, '<.*?>')
AND PRIOR sys_guid() IS NOT NULL
AND PRIOR content_id = content_id
ORDER BY t.content_id, LEVEL

Note that the CONNECT_BY_LEVEL method might be slower on large datasets, so I would avoid that if performance is a concern.

how to extract string between 2 strings

How about this? I added another value into the table, that doesn't contain the "Reasons removed" string.

Basically, it selects everything between "Reasons added" and "Reasons removed" if it exists; if not, takes everything from "Reasons added" to the end. CASE takes care about that.

SQL> select * from test;

COL
----------------------------------------------------------------------------------------------
"Change step forward
Note added
Step changed from [OFF] to [ON]
Reasons added: test1 (Type), some test 2 (Type), test3 , sometest4& and5(Type)
Reasons removed: test6- test7(Type)"

"This is text without "Reas. remov."
Note added
Step changed from [OFF] to [ON]
Reasons added: test1 (Type), some test 2 (Type), test3 , and this would be the end"


SQL> select trim(substr(col,
2 instr(col, 'Reasons added') + 14,
3 --
4 case when instr(col, 'Reasons removed') > 0 then
5 instr(col, 'Reasons removed') - instr(col, 'Reasons added') - 14
6 else
7 length(col)
8 end)) result
9 from test;

RESULT
--------------------------------------------------------------------------------
test1 (Type), some test 2 (Type), test3 , sometest4& and5(Type)
test1 (Type), some test 2 (Type), test3 , and this would be the end"

SQL>

How do I extract data between two strings based on a pattern in Oracle SQL

Here is how I would do this. Note a couple of things:

  1. The output preserves newlines that existed in the input. You didn't
    say anything about removing them; however, your output doesn't show
    them. In any case - they can be removed, if needed, but that is an
    unrelated process.
  2. You say "word" but obviously you are using that in a sense different
    from the common usage in regular expressions. In regexp, "word
    characters" are only letters, digits and underscore; yet your
    "words" include brackets, equal sign, and who knows what else. I interpreted the term "word" to mean any
    sequence of consecutive non-whitespace characters.

Here is how we can recreate your data. When you ask a question here, this is how you should provide sample data - not as an image that we can't copy and paste in an SQL editor.

CREATE TABLE sample_data( col_a varchar2(20), col_b CLOB );

INSERT INTO sample_data VALUES
('12345', to_clob(
'Created:2/28/2019
Updated:1/19/2021
LIST:[ABC][DEF][GHI]
[LMNO][PQRST]
[Location=BLAH].[City=BLAH]'));

INSERT INTO sample_data VALUES
('12346', to_clob(
'Created:2/28/2019
Updated:1/19/2021
LIST:[ABC][DEF][GHI]
[LMNO][PQRST]
[SOC].[RAW]'));

commit;

Then here is the query and the output. Note that, depending on your interface (in my case: SQL Developer, which uses a SQL*Plus-like interface), you may need to change some settings so that the output is not truncated. In particular, in SQL*Plus, CLOB columns are truncated to 80 characters by default; I had to

set long 100

So - query and output:

select col_a, col_b,
regexp_substr(col_b, '(\s|^)(LIST:[^.]*?)\s+\S+\.', 1, 1, null, 2)
as result
from sample_data
;

COL_A COL_B RESULT
----- ------------------------------ ------------------------------
12345 Created:2/28/2019 LIST:[ABC][DEF][GHI]
Updated:1/19/2021 [LMNO][PQRST]
LIST:[ABC][DEF][GHI]
[LMNO][PQRST]
[Location=BLAH].[City=BLAH]

12346 Created:2/28/2019 LIST:[ABC][DEF][GHI]
Updated:1/19/2021 [LMNO][PQRST]
LIST:[ABC][DEF][GHI]
[LMNO][PQRST]
[SOC].[RAW]

The regular expression matches a single whitespace character or the beginning of the string ((\s|^)), then the characters LIST: followed by as few consecutive, non-period characters (this will match spaces and newline characters, in particular) as needed to allow a match - which continues with one or more whitespace characters, followed by a single word (string of 1 or more non-whitespace characters) and a literal period (\.).

The expression we must return is enclosed in parentheses, so that we can return it from regexp_substr. Such an expression is called a "capture group". The regexp includes another capture group, (\s|^), out of necessity (alternation), so the capture group we must return is the second in the regexp. This is what the last argument to regexp_substr does: it instructs the function to return the second capture group.

Note a peculiar thing about the period (related to the much more general concept of escaping within bracket expressions): the period must be escaped to represent a literal period, rather than "any character", at the end of the regular expression; however, within the (negated) bracket expression [^.]*?, the period - representing a literal period, not "any character" - is not escaped. Oracle follows the ERE (extended regular expressions) dialect of the POSIX standard, and that standard says that escape sequences are invalid within bracket expressions. This is different from other regular expression dialect, and confuses a lot of users.

Oracle SQL: Getting text between two sentences

A straight solution is to use substr function in conjunction with instr function. Or DBMS_LOB.instr and substr if the datatype of the column is a LOB.

documentation for dbms_lob.instr

for example:

select 
substr(
mycolumn,
instr(mycolumn,'Pd to LOOK:'), --starting position
instr(mycolumn,'Pd to SEE (swap):') - instr(mycolumn,'Pd to LOOK:'), --amount to get
) as "Pd to LOOK:"
--, other expressions/columns
from
my_table;


Related Topics



Leave a reply



Submit