How to Remove Duplicates from Space Separated List by Oracle Regexp_Replace

How to remove duplicates from space separated list by Oracle regexp_replace?

If I understand well you don't simply need to replace ',' with a space, but also to remove duplicates in a smarter way.

If I modify that expression to work with space instead of ',', I get

select regexp_replace('A B A A C D' ,'([^ ]+)( [ ]*\1)+', '\1') from dual

which gives 'A B A C D', not what you need.

A way to get your needed result could be the following, a bit more complicated:

with string(s) as ( select 'A B A A C D' from dual)    
select listagg(case when rn = 1 then str end, ' ') within group (order by lev)
from (
select str, row_number() over (partition by str order by 1) rn, lev
from (
SELECT trim(regexp_substr(s, '[^ ]+', 1, level)) str,
level as lev
FROM string
CONNECT BY instr(s, ' ', 1, level - 1) > 0
)
)

My main problem here is that I'm not able to build a regexp that checks for non adjacent duplicates, so I need to split the string, check for duplicates and then aggregate again the non duplicated values, keeping the order.

If you don't mind the order of the tokens in the result string, this can be simplified:

with string(s) as ( select 'A B A A C D' from dual)
select listagg(str, ' ') within group (order by 1)
from (
SELECT distinct trim(regexp_substr(s, '[^ ]+', 1, level)) as str
FROM string
CONNECT BY instr(s, ' ', 1, level - 1) > 0
)

How to remove duplicates from comma separated list by regex in Oracle regexp_replace?

([^,]+)(,[ ]*\1)+

Try this.This works.See demo.

http://regex101.com/r/yG7zB9/8

The issue was VA - HRD 1, VA - HRD 1

                     ^  ^

The space here.You were not taking this into account as the first match has no space behid it.So inlcde [ ]* or \s* to make it accept.

Remove duplicates from comma separated list with regexp

Based on this link to split a comma separated value into rows, I splitted the string into rows, kept the position of the first occurence, made a distinct a reaggregated the values

with test_string as ( 
select 1 as id,
'contract, clause 1, Subsection 1.1, contract, clause 1, Subsection 1.2, paragraph (a), contract, clause 1, Subsection 1.2, paragraph (b), contract, clause 2' val
from dual)
select id, listagg(word,', ') WITHIN GROUP (order by position) FROM (
select distinct id, first_value(position) over ( partition by word order by position ) position, word from (
select
distinct t.id,
levels.column_value as position,
trim(regexp_substr(t.val, '[^,]+', 1, levels.column_value)) as word
from
test_string t,
table(cast(multiset(select level from dual connect by level <= length (regexp_replace(t.val, '[^,]+')) + 1) as sys.OdciNumberList)) levels
)
) GROUP BY id

And if you are not interested in keeping the order

with test_string as ( 
select 1 as id,
'contract, clause 1, Subsection 1.1, contract, clause 1, Subsection 1.2, paragraph (a), contract, clause 1, Subsection 1.2, paragraph (b), contract, clause 2' val
from dual)
select id, listagg(word,', ') WITHIN GROUP (order by 1) FROM (
select
distinct t.id,
trim(regexp_substr(t.val, '[^,]+', 1, levels.column_value)) as word
from
test_string t,
table(cast(multiset(select level from dual connect by level <= length (regexp_replace(t.val, '[^,]+')) + 1) as sys.OdciNumberList)) levels
) GROUP BY id

How to remove duplicates from comma separated list by regex in Oracle but I don't want duplicates values?

Try this, as per article http://www.dba-oracle.com/t_extract_comma_delimited_strings_oracle_sql.html:

select distinct str from
(select regexp_substr ('ABCD1234, XYZ, ABCD1234, ABCD1234C, ABCD1234, abc, abcX, 1234U, 1234', '[^, ]+',1, rownum) str
from dual
connect by level <= regexp_count ('ABCD1234, XYZ, ABCD1234, ABCD1234C, ABCD1234, abc, abcX, 1234U, 1234', '[^, ]+')) v;

Fiddle: http://sqlfiddle.com/#!4/c858d/5

Remove duplicate values from comma separated variable in Oracle

Solution description. Use CTE to first split up the list of emails into rows with 1 email address per row (testd_rows). Then select distinct rows (testd_rows_unique) from testd_rows and finally put them back together with listagg. From 19c onwards you can use LISTAGG with the DISTINCT keyword.

set serveroutput on size 999999
clear screen
declare

all_email_list varchar2(4000);
l_unique_email_list varchar2(4000);


begin
all_email_list := 'test@asd.com, test2@asd.com,test@asd.com,test3@asd.com, test4@asd.com,test2@asd.com';

WITH testd_rows(email) AS
(
select regexp_substr (all_email_list, '[^, ]+', 1, rownum) split
from dual
connect by level <= length (regexp_replace (all_email_list, '[^, ]+')) + 1
), testd_rows_unique(email) AS
(
SELECT distinct email FROM testd_rows
)
SELECT listagg(email, ',') WITHIN GROUP (ORDER BY email)
INTO l_unique_email_list
FROM testd_rows_unique;

dbms_output.put_line(l_unique_email_list);
end;
/

test2@asd.com,test3@asd.com,test4@asd.com,test@asd.com

But ... why are you converting rows to a comma separated string and then de-duping it ? Use UNION to take out the duplicate values in a single SELECT statement and do LISTAGG on the values. No regexp needed then. UNION will skip duplicates as opposed to UNION ALL which returns all the rows.

DECLARE
all_email_list varchar2(4000);
BEGIN
WITH all_email (email) AS
(
select email from UM_USER a left join UM_USERROLLE b on (a.mynetuser=b.NT_NAME) left join UM_RULES c on (c.id=b.RULEID) where RULEID = 902
UNION
select email from table2 where CFT_ID =:P25_CFT_TEAM
UNION
select email from table3 WHERE :P25_ID = ID
)
SELECT listagg(email, ',') WITHIN GROUP (ORDER BY email)
INTO all_email_list
FROM all_email;

dbms_output.put_line(all_email_list);
END;
/

remove duplicate values from a oracle sql query's output

Assuming that your table contains strings with values separated with commas.

You can try something like this:

Here is a sqlfiddle demo

select rtrim(xmltype('<r><n>' || 
replace(REGEXP_REPLACE( col, '[A-Za-z]' , '' ), ',', ',</n><n>')||',</n></r>'
).extract('//n[not(preceding::n = .)]/text()').getstringval(), ',')
from tablex;

What it does is after using your regexp_replace it makes a xmltype from it and then uses XPATH to get the desired output.

If you also want to sort the values (and still use the xml approach) then you need XSL

select rtrim(xmltype('<r><n>' || 
replace(REGEXP_REPLACE( col, '[A-Za-z]' , '' ), ',', '</n><n>')||'</n></r>'
).extract('//n[not(preceding::n = .)]')
.transform(xmltype('<?xml version="1.0" ?><xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"><xsl:template match="/"><xsl:for-each select="//n[not(preceding::n = .)]"><xsl:sort select="." data-type="number"/><xsl:value-of select="."/>,</xsl:for-each></xsl:template></xsl:stylesheet>'))
.getstringval(), ',')
from tablex;

But you can also try different approaches, such as splitting the tokens to rows and then recollecting them

select rtrim(xmlagg(xmlelement(e, n || ',') order by to_number(n))
.extract('//text()'), ',')
from(
SELECT distinct rn, trim(regexp_substr(col, '[^,]+', 1, level)) n
FROM (select row_number() over (order by col) rn ,
REGEXP_REPLACE( col, '[A-Za-z]' , '' ) col
from tablex) t
CONNECT BY instr(col, ',', 1, level - 1) > 0
)
group by rn;

oracle query to remove extra spaces after word and no spaces after a dot in a string

Using regexp_replace:

FSITJA@db01> select regexp_replace('My  name is Pramod. I    am writing   .   a query, Today is AUG 16TH:   2019; X11.  abc', '([,;:. ]){1} +', '\1')
2 from dual;

REGEXP_REPLACE('MYNAMEISPRAMOD.IAMWRITING.AQUERY,TODAYISAUG16TH:2019;X
----------------------------------------------------------------------
My name is Pramod.I am writing .a query,Today is AUG 16TH:2019;X11.abc

Splitting comma separated values in Oracle

Works perfectly for me -

SQL> WITH dummy_table AS(
2 SELECT '3862,3654,3828' dummy FROM dual
3 )
4 SELECT trim(regexp_substr(dummy,'[^,]+',1,Level)) AS dummycol
5 FROM dummy_table
6 CONNECT BY level <= LENGTH(REGEXP_REPLACE(dummy,'[^,]+'))+1
7 /

DUMMYCOL
--------------
3862
3654
3828

SQL>

There are many other ways of achieving it. Read Split single comma delimited string into rows.

Update Regarding the duplicates when the column is used instead of a single string value. Saw the use of DBMS_RANDOM in the PRIOR clause to get rid of the cyclic loop here

Try the following,

SQL> WITH dummy_table AS
2 ( SELECT 1 rn, '3862,3654,3828' dummy FROM dual
3 UNION ALL
4 SELECT 2, '1234,5678' dummy FROM dual
5 )
6 SELECT trim(regexp_substr(dummy,'[^,]+',1,Level)) AS dummycol
7 FROM dummy_table
8 CONNECT BY LEVEL <= LENGTH(REGEXP_REPLACE(dummy,'[^,]+'))+1
9 AND prior rn = rn
10 AND PRIOR DBMS_RANDOM.VALUE IS NOT NULL
11 /

DUMMYCOL
--------------
3862
3654
3828
1234
5678

SQL>

Update 2

Another way,

SQL> WITH dummy_table AS
2 ( SELECT 1 rn, '3862,3654,3828' dummy FROM dual
3 UNION ALL
4 SELECT 2, '1234,5678,xyz' dummy FROM dual
5 )
6 SELECT trim(regexp_substr(t.dummy, '[^,]+', 1, levels.column_value)) AS dummycol
7 FROM dummy_table t,
8 TABLE(CAST(MULTISET
9 (SELECT LEVEL
10 FROM dual
11 CONNECT BY LEVEL <= LENGTH (regexp_replace(t.dummy, '[^,]+')) + 1
12 ) AS sys.OdciNumberList)) LEVELS
13 /

DUMMYCOL
--------------
3862
3654
3828
1234
5678
xyz

6 rows selected.

SQL>


Related Topics



Leave a reply



Submit