Regex to Select Nth Value from a List, Allowing for Nulls

REGEX to select nth value from a list, allowing for nulls

Thanks to those who replied. After perusing your answers and the answers in the link supplied, I arrived at this solution:

SQL> select REGEXP_SUBSTR('1,,3,4,5', '(.*?)(,|$)', 1, 2, NULL, 1) data
  2  from dual;

Data
----

Which can be described as "look at the 2nd occurrence of an optional set of zero or more characters that are followed by a comma or the end of the line, and return the 1st subgroup (which is the data less the comma or end of the line).

I forgot to mention I tested with the null in various positions, multiple nulls, selecting various positions, etc.

The only caveat I could find is if the field you look for is greater than the number available, it just returns NULL so you need to be aware of that. Not a problem for my case.

EDIT: I am updating the accepted answer for the benefit of future searchers that may stumble upon this.

The next step is to encapsulate the code so it can be made into a simpler, reusable function. Here is the function source:

  FUNCTION  GET_LIST_ELEMENT(string_in VARCHAR2, element_in NUMBER, delimiter_in VARCHAR2 DEFAULT ',') RETURN VARCHAR2 IS
    BEGIN
      RETURN REGEXP_SUBSTR(string_in, '(.*?)(\'||delimiter_in||'|$)', 1, element_in, NULL, 1);
  END GET_LIST_ELEMENT;

This hides the regex complexities from developers who may not be so comfortable with it and makes the code cleaner anyway when in use. Call it like this to get the 4th element:

select get_list_element('123,222,,432,555', 4) from dual;

Oracle REGEX_SUBSTR Not Honoring null values

Thanks for pointing me in the right direction, I have used this to solve the issue.

SELECT REGEXP_SUBSTR (val, '([^,]*),|$', 1, 1, NULL, 1) phn_nbr , REGEXP_SUBSTR (val, '([^,]*),|$', 1, 2, NULL, 1) phn_pos , REGEXP_SUBSTR (val, '([^,]*),|$', 1, 3, NULL, 1) phn_typ , REGEXP_SUBSTR (val, '([^,]*),|$', 1, 4, NULL, 1) phn_strt_dt , REGEXP_SUBSTR (val, '([^,]*),|$', 1, 5, NULL, 1) phn_end_dt , REGEXP_SUBSTR (val || ',', '([^,]*),|$', 1, 6, NULL, 1) pub_indctr FROM (SELECT '2035197553,2,S,14-JUN-14,,P' val FROM dual );

Oracle Version:- Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production

Regex to extract nth token of a string separated by pipes

For DB2 please try this to get the 6th element in the list. This works on Oracle and allows for NULL list elements. The syntax for the REGEXP_SUBSTR call is the same so I suspect it will work:

regexp_substr('AA||CCCCCCCC|||FFFFFFFFFFF', '(.*?)(\||$)', 1, 6, 'c', 1)

EDIT: 'c' for case-sensitive

Replacing the nth instance of a regex match in Javascript

here's something that works:

"23||45||45||56||67".replace(/^((?:[0-9]+\|\|){n})([0-9]+)\|\|/,"$1$2&&")

where n is the one less than the nth pipe, (of course you don't need that first subexpression if n = 0)

And if you'd like a function to do this:

function pipe_replace(str,n) {
   var RE = new RegExp("^((?:[0-9]+\\|\\|){" + (n-1) + "})([0-9]+)\|\|");
   return str.replace(RE,"$1$2&&");
}

Find substring after nth occurrence of substring in a string in oracle

This will return everything after second occurance of ##:

substr(string, instr(string, '##', 1, 2)+1)

If you need to find a substring with specific length, then just add third parameter to substr function

substr(string, instr(string, '##', 1, 2)+1, 2)

You can also use it in query:

select 
  substr(some_value, instr(some_value, '##', 1, 2)+1, 2) 
from some_table
where...

How to split the given string as per requirement using oracle

SQL Fiddle

Oracle 11g R2 Schema Setup:

CREATE TABLE Names ( Name ) AS
          SELECT 'Covey, Stephen J, Mr' FROM DUAL
UNION ALL SELECT 'Clinton, Hilary B,' FROM DUAL
UNION ALL SELECT 'Obama, Barack, Mr' FROM DUAL

Query 1:

SELECT REGEXP_SUBSTR( Name, '^(.*?),\s*(.*?)(\s+(\w))?,\s*(.*)$', 1, 1, NULL, 1 ) AS Last_Name,
       REGEXP_SUBSTR( Name, '^(.*?),\s*(.*?)(\s+(\w))?,\s*(.*)$', 1, 1, NULL, 2 ) AS First_Name,
       REGEXP_SUBSTR( Name, '^(.*?),\s*(.*?)(\s+(\w))?,\s*(.*)$', 1, 1, NULL, 4 ) AS Middle_Initial,
       REGEXP_SUBSTR( Name, '^(.*?),\s*(.*?)(\s+(\w))?,\s*(.*)$', 1, 1, NULL, 5 ) AS Title
FROM   Names

Results:

| LAST_NAME | FIRST_NAME | MIDDLE_INITIAL |  TITLE |
|-----------|------------|----------------|--------|
|     Covey |    Stephen |              J |     Mr |
|   Clinton |     Hilary |              B | (null) |
|     Obama |     Barack |         (null) |     Mr |

Query 2:

SELECT REGEXP_REPLACE( Name, '^(.*?),\s*(.*?)(\s+(\w))?,\s*(.*)$', '\1' ) AS Last_Name,
       REGEXP_REPLACE( Name, '^(.*?),\s*(.*?)(\s+(\w))?,\s*(.*)$', '\2' ) AS First_Name,
       REGEXP_REPLACE( Name, '^(.*?),\s*(.*?)(\s+(\w))?,\s*(.*)$', '\4' ) AS Middle_Initial,
       REGEXP_REPLACE( Name, '^(.*?),\s*(.*?)(\s+(\w))?,\s*(.*)$', '\5' ) AS Title
FROM   Names

Results:

| LAST_NAME | FIRST_NAME | MIDDLE_INITIAL |  TITLE |
|-----------|------------|----------------|--------|
|     Covey |    Stephen |              J |     Mr |
|   Clinton |     Hilary |              B | (null) |
|     Obama |     Barack |         (null) |     Mr |

Query 3:

WITH Split_Names AS (
  SELECT REGEXP_SUBSTR( Name, '^[^,]+' ) AS Last_Name,
         REGEXP_REPLACE( Name, '^.*?,\s*|\s*,.*?$' ) AS Given_Names,
         REGEXP_SUBSTR( Name, '[^\s,]+$' ) AS Title
  FROM   Names
)
SELECT Last_Name,
       REGEXP_REPLACE( Given_Names, '\s+\w$' ) AS First_Name,
       TRIM( REGEXP_SUBSTR( Given_Names, '\s+\w$' ) ) AS Middle_Initial,
       Title
FROM   Split_Names

Results:

| LAST_NAME | FIRST_NAME | MIDDLE_INITIAL |  TITLE |
|-----------|------------|----------------|--------|
|     Covey |    Stephen |              J |     Mr |
|   Clinton |     Hilary |              B | (null) |
|     Obama |     Barack |         (null) |     Mr |

How to get character or string after nth occurrence of pipeline '|' symbol in ORACLE using REGULAR_EXPRESSION?

Here ya go. Replace the 4th argument to regexp_substr() with the number of the field you want.

with tbl(str) as (
  select 'Jack|Sparrow|17-09-16|DY7009|Address at some where|details ' from dual
)
select regexp_substr(str, '(.*?)(\||$)', 1, 4, NULL, 1) field_4
from tbl;

FIELD_4
--------

DY7009

SQL>

To list all the fields:

with tbl(str) as (
  select 'Jack|Sparrow|17-09-16|DY7009|Address at some where|details ' from dual
)
select regexp_substr(str, '(.*?)(\||$)', 1, level, NULL, 1) split
from tbl
connect by level <= regexp_count(str, '\|')+1;

SPLIT
-------------------------

Jack
Sparrow
17-09-16
DY7009
Address at some where
details

6 rows selected.

SQL>

So if you want select fields you could use:

with tbl(str) as (
      select 'Jack|Sparrow|17-09-16|DY7009|Address at some where|details ' from dual
    )
    select 
      regexp_substr(str, '(.*?)(\||$)', 1, 1, NULL, 1) first,
      regexp_substr(str, '(.*?)(\||$)', 1, 2, NULL, 1) second,
      regexp_substr(str, '(.*?)(\||$)', 1, 3, NULL, 1) third,
      regexp_substr(str, '(.*?)(\||$)', 1, 4, NULL, 1) fourth
    from tbl;

Note this regex handles NULL elements and will still return the correct value. Some of the other answers use the form '[^|]+' for parsing the string but this fails when there is a NULL element and should be avoided. See here for proof: https://stackoverflow.com/a/31464699/2543416

regexp_substr skips over empty positions

OK. This should be the best solution for you.

SELECT
      REGEXP_REPLACE ( 'Mike|Male||20000|Yes',
                    '^([^|]*\|){2}([^|]*).*$',
                    '\2' )
          TEXT
FROM
      DUAL;

So for your problem

SELECT
      REGEXP_REPLACE ( INCOMINGSTREAMOFSTRINGS,
                    '^([^|]*\|){N-1}([^|]*).*$',
                    '\2' )
          TEXT
FROM
      DUAL;

--INCOMINGSTREAMOFSTRINGS is your complete string with delimiter

--You should pass n-1 to obtain nth position

ALTERNATE 2:

WITH T AS (SELECT 'Mike|Male||20000|Yes' X FROM DUAL)
SELECT
      X,
      REGEXP_REPLACE ( X,
                    '^([^|]*).*$',
                    '\1' )
          Y1,
      REGEXP_REPLACE ( X,
                    '^[^|]*\|([^|]*).*$',
                    '\1' )
          Y2,
      REGEXP_REPLACE ( X,
                    '^([^|]*\|){2}([^|]*).*$',
                    '\2' )
          Y3,
      REGEXP_REPLACE ( X,
                    '^([^|]*\|){3}([^|]*).*$',
                    '\2' )
          Y4,
      REGEXP_REPLACE ( X,
                    '^([^|]*\|){4}([^|]*).*$',
                    '\2' )
          Y5
FROM
      T;

ALTERNATE 3:

SELECT
      REGEXP_SUBSTR ( REGEXP_REPLACE ( 'Mike|Male||20000|Yes',
                                '\|',
                                ';' ),
                   '(^|;)([^;]*)',
                   1,
                   1,
                   NULL,
                   2 )
          AS FIRST,
      REGEXP_SUBSTR ( REGEXP_REPLACE ( 'Mike|Male||20000|Yes',
                                '\|',
                                ';' ),
                   '(^|;)([^;]*)',
                   1,
                   2,
                   NULL,
                   2 )
          AS SECOND,
      REGEXP_SUBSTR ( REGEXP_REPLACE ( 'Mike|Male||20000|Yes',
                                '\|',
                                ';' ),
                   '(^|;)([^;]*)',
                   1,
                   3,
                   NULL,
                   2 )
          AS THIRD,
      REGEXP_SUBSTR ( REGEXP_REPLACE ( 'Mike|Male||20000|Yes',
                                '\|',
                                ';' ),
                   '(^|;)([^;]*)',
                   1,
                   4,
                   NULL,
                   2 )
          AS FOURTH,
      REGEXP_SUBSTR ( REGEXP_REPLACE ( 'Mike|Male||20000|Yes',
                                '\|',
                                ';' ),
                   '(^|;)([^;]*)',
                   1,
                   5,
                   NULL,
                   2 )
          AS FIFTH
FROM
      DUAL;

Split and Sequence String - Oracle SQL

Sure. Here's the simple version, which gets you col1. This works a lot better if you use the splitting regexp substr from this question.

with t as (select 'PPID|1||123456789^^^^VV||PIZZA^KEVIN^^^^^L||98765432||' as str from dual)
select regexp_substr(t.str,'(.*?)(\||\^|$)', 1, level, null, 1) col1
from t
connect by level <= regexp_count(t.str, '(.*?)(\||\^|$)');

Adding your second column creates some significant complexity. There's probably a graceful way to do it by joining two hierarchical queries, but I can't do that well, so I just used some analytic functions.

with t as (select 'PPID|1||123456789^^^^VV||PIZZA^KEVIN^^^^^L||98765432||' as str from dual)
select col1,
    'PID'
      -- count pipes seen so far
    || trim(to_char(nvl(sum(case when sep = '|' then 1 else 0 end) 
                         over (order by lev rows between unbounded preceding and 1 preceding)
                     ,0)
              ,'00')) 
    -- count hats (within a partition defined by the number of pipes seen so far)
    || CASE when sep = '^' or lag(sep) over (order by lev) = '^' THEN
        '-' || trim(to_char(row_number() over (partition by regexp_count(seen, '\|') 
                                       order by lev) - 1, '00'))
        ELSE null end as col2
from (        
    select regexp_substr(t.str,'(.*?)(\||\^|$)', 1, level, null, 1) col1,
        regexp_substr(t.str,'(.*?)(\||\^|$)', 1, level, null, 2) sep,
        level as lev,
        substr(t.str,1,regexp_instr(t.str,'(.*?)(\||\^|$)', 1, level, 0)) as seen
    from t
    connect by level <= regexp_count(t.str, '(.*?)(\||\^|$)')
    ) s
;

Output:

col1      col2
PPID      PID00
1         PID01
          PID02
123456789 PID03-01
          PID03-02
          PID03-03
          PID03-04
VV        PID03-05
          PID04
PIZZA     PID05-01
KEVIN     PID05-02
          PID05-03
          PID05-04
          PID05-05
          PID05-06
L         PID05-07
          PID06
98765432  PID07
          PID08
          PID09

Let me know if you have any questions.

EDIT: Well, regexp_substr and hierarchical queries are both pretty slow. I rewrote it using MT0's recursive CTE no-regex answer on this question. It's still pretty sloppy, I'm sure it could be cleaned up.

WITH ex as (select 'PPID|1||123456789^^^^VV||PIZZA^KEVIN^^^^^L||98765432||' as str from dual),
  t ( str, start_pos, end_pos ) AS
  ( SELECT str, 1, LEAST(INSTR(str, '|'),INSTR(str, '^')) FROM ex
  UNION ALL
  SELECT str,
    end_pos + 1,
    CASE WHEN INSTR(str, '|', end_pos + 1) > 0 and INSTR(str, '^', end_pos + 1) > 0 THEN
        LEAST(INSTR(str, '|', end_pos + 1),INSTR(str, '^', end_pos + 1))
        ELSE GREATEST(INSTR(str, '|', end_pos + 1),INSTR(str, '^', end_pos + 1)) END
  FROM t
  WHERE end_pos > 0
  )
select col1,
    'PID' 
    -- count pipes
    || trim(to_char(nvl(sum(case when rsep = '|' then 1 else 0 end) 
                         over (order by start_pos rows between unbounded preceding and 1 preceding)
                     ,0)
              ,'00'))
    -- count hats 
    || CASE when '^' in (lsep,rsep) THEN
        '-' || trim(to_char(row_number() over (partition by (length(seen)-length(replace(seen, '|')))
                                       order by start_pos), '00'))
        ELSE null end
              as col_seq
from (              
    SELECT str, start_pos, end_pos, 
      SUBSTR( str, start_pos, DECODE( end_pos, 0, LENGTH(str) + 1, end_pos ) - start_pos ) AS col1,
      SUBSTR( str, start_pos-1, 1) as lsep, SUBSTR(str, DECODE( end_pos, 0, LENGTH(str) + 1, end_pos ), 1) as rsep,
      SUBSTR( str, 1, DECODE( end_pos, 0, LENGTH(str) + 1, end_pos )-1 ) as seen
    FROM t) s
order by start_pos;

Regex to Select Nth Value from a List, Allowing for Nulls