Performance of Regexp_Replace VS Translate in Oracle

Performance of regexp_replace vs translate in Oracle?

I think you're running into simple optimization. The regexp expression is so expensive to compute that the result is cached in the hope that it will be used again in the future. If you actually use distinct strings to convert, you will see that the modest translate is naturally faster because it is its specialized function.

Here's my example, running on 11.1.0.7.0:

SQL> DECLARE
  2     TYPE t IS TABLE OF VARCHAR2(4000);
  3     l       t;
  4     l_level NUMBER := 1000;
  5     l_time  TIMESTAMP;
  6     l_char  VARCHAR2(4000);
  7  BEGIN
  8     -- init
  9     EXECUTE IMMEDIATE 'ALTER SESSION SET PLSQL_OPTIMIZE_LEVEL=2';
 10     SELECT dbms_random.STRING('p', 2000)
 11       BULK COLLECT
 12       INTO l FROM dual
 13     CONNECT BY LEVEL <= l_level;
 14     -- regex
 15     l_time := systimestamp;
 16     FOR i IN 1 .. l.count LOOP
 17        l_char := regexp_replace(l(i), '[]()[]', '-', 1, 0);
 18     END LOOP;
 19     dbms_output.put_line('regex     :' || (systimestamp - l_time));
 20     -- tranlate
 21     l_time := systimestamp;
 22     FOR i IN 1 .. l.count LOOP
 23        l_char := translate(l(i), '()[]', '----');
 24     END LOOP;
 25     dbms_output.put_line('translate :' || (systimestamp - l_time));
 26  END;
 27  /

regex     :+000000000 00:00:00.979305000
translate :+000000000 00:00:00.238773000

PL/SQL procedure successfully completed

on 11.2.0.3.0 :

regex     :+000000000 00:00:00.617290000
translate :+000000000 00:00:00.138205000

Conclusion: In general I suspect translate will win.

Oracle Regexp_replace multiple occurrence

Try this:

select (case when substr(val, 1, 1) between '0' and '9' then 'C' else '' end) ||
       regexp_replace(val, '([-+.,;:'"!])', '_')

Performing multiple Replace operations on an Oracle database column

You can nest the replace() calls:

UPDATE TABLE1
    SET COLUMN1 = REPLACE(REPLACE(REPLACE(REPLACE(COLUMN1, 'ABC', 'SR1'), '123', 'SR2'), 'XYZ', 'SR3'), '789', 'SR4');

Oracle also offers regexp_replace(). You might find this handy for some of your data transformations.

In Oracle what is the fastest way to limit the characters in a string?

You can try this approach, which looks to be much faster based on some (very) loose tests. Its a function that's compiled natively:

CREATE OR REPLACE function clean_string(
    in_string    in varchar2)
 return varchar2 AS
   out_string varchar2(4000) := '';
   in_length number;
   cnt number := 0;
   in_char char(1);
   out_char char(1);
   dec_char number;
   prev_space boolean := false;
begin
   --dbms_output.put_line('In string: ' || in_string);
   in_length := LENGTH(in_string);
   while cnt < in_length
   LOOP
     cnt := cnt + 1;
     in_char := substr(in_string, cnt, 1);
     dec_char := ascii(in_char);
     -- blank out non alphanumerics
     IF (
       (dec_char >= 48 AND dec_char <= 57) OR
       (dec_char >= 65 AND dec_char <= 90) OR
       (dec_char >= 97 AND dec_char <= 122)
       ) THEN
         --keep it
         out_char := in_char;
     ELSE
       out_char := ' ';
     END IF;

     IF (NOT(prev_space AND out_char = ' ')) THEN
       out_string := out_string || out_char;
     END IF;

     <<endloop>>
     IF (out_char = ' ') THEN
       prev_space := true;
     ELSE
       prev_space := false;
     END IF;

   END LOOP;
   return trim(upper(out_string));
end;

ALTER SESSION SET PLSQL_CODE_TYPE=NATIVE;
ALTER function clean_string COMPILE;

And to test, I pulled 5 million rows from a table and cleaned some strings:

set serveroutput on
declare
    cursor sel_cur1 is
    select name, clean_string(name) as cln_name,
        address1, clean_string(address1) as cln_addr1,
        address2, clean_string(address2) as cln_addr2,
        city, clean_string(city) as cln_city,
        state, clean_string(state) as cln_state,
        postalcode, clean_string(postalcode) as cln_zip
    from my_table
    where rownum <= 5000000;

    cursor sel_cur2 is
    select name,
        address1,
        address2,
        city,
        state,
        postalcode
    from my_table
    where rownum <= 5000000;

    l_cnt integer := 0;
    l_cln_name varchar2(100);
    l_cln_addr1 varchar2(100);
    l_cln_addr2 varchar2(100);
    l_cln_city varchar2(100);
    l_cln_state varchar2(100);
    l_cln_zip varchar2(100);

    l_interval interval day to second(4);
    l_start timestamp;
    l_end timestamp;
begin
   l_start := systimestamp;
   for rec in sel_cur2
   loop
         l_cnt := l_cnt + 1;
         l_cln_name := clean_string(rec.name);
         l_cln_addr1 := clean_string(rec.address1);
         l_cln_addr2 := clean_string(rec.address2);
         l_cln_city := clean_string(rec.city);
         l_cln_state := clean_string(rec.state);
         l_cln_zip := clean_string(rec.postalcode);
   end loop;
    l_end := systimestamp;
    l_interval := l_end - l_start;
    dbms_output.put_line('Procedural approach timing: ' || l_interval);
   -------------------------------------------------
   l_cnt := 0;
   l_start := systimestamp;
   for rec in sel_cur1
   loop
         -- cleaning already done in SQL
         l_cnt := l_cnt + 1;
   end loop;
   l_end := systimestamp;
   l_interval := l_end - l_start;
   dbms_output.put_line('SQL approach timing: ' || l_interval);

   -------------------------------------------------
   l_cnt := 0;
   l_start := systimestamp;
   for rec in sel_cur2
   loop
         l_cnt := l_cnt + 1;
         l_cln_name := UPPER(TRIM(REGEXP_REPLACE(REGEXP_REPLACE(rec.name, '[^0-9A-Za-z ]', ' '),'( )* ',' ')));
         l_cln_addr1 := UPPER(TRIM(REGEXP_REPLACE(REGEXP_REPLACE(rec.address1, '[^0-9A-Za-z ]', ' '),'( )* ',' ')));
         l_cln_addr2 := UPPER(TRIM(REGEXP_REPLACE(REGEXP_REPLACE(rec.address2, '[^0-9A-Za-z ]', ' '),'( )* ',' ')));
         l_cln_city := UPPER(TRIM(REGEXP_REPLACE(REGEXP_REPLACE(rec.city, '[^0-9A-Za-z ]', ' '),'( )* ',' ')));
         l_cln_state := UPPER(TRIM(REGEXP_REPLACE(REGEXP_REPLACE(rec.state, '[^0-9A-Za-z ]', ' '),'( )* ',' ')));
         l_cln_zip := UPPER(TRIM(REGEXP_REPLACE(REGEXP_REPLACE(rec.postalcode, '[^0-9A-Za-z ]', ' '),'( )* ',' ')));
   end loop;
   l_end := systimestamp;
   l_interval := l_end - l_start;
   dbms_output.put_line('Existing approach timing: ' || l_interval);
end;

And the output was:

Procedural approach timing: +00 00:02:04.0320
SQL approach timing: +00 00:02:49.4326
Existing approach timing: +00 00:05:50.1607

Also, the native compilation seems to only help a procedural approach to the processing (rather than calling the function from a SQL query), but appears to be much faster than the regexp_replace solution. Hope that helps.

Multiple REPLACE function in Oracle

Even if this thread is old is the first on Google, so I'll post an Oracle equivalent to the function implemented here, using regular expressions.

Is fairly faster than nested replace(), and much cleaner.

To replace strings 'a','b','c' with 'd' in a string column from a given table

select regexp_replace(string_col,'a|b|c','d') from given_table

It is nothing else than a regular expression for several static patterns with 'or' operator.

Beware of regexp special characters!

Replace characters after a defined pattern with regexp_replace in oracle

Here's a solution using regexp_replace. The 4th argument is the starting position. After some thought, I decided not to skip the '+2'. Don't be lazy and waste cycles testing characters you know are not the target character.

SQL> with tbl(str) as (
     select 'Heyho || HeyheyHo' from dual
   )
   select str before,
          regexp_replace(str, 'y', 'i', instr(str, '||')+2) after
   from tbl;

BEFORE            AFTER
----------------- -----------------
Heyho || HeyheyHo Heyho || HeiheiHo

SQL>

Performance of Regexp_Replace VS Translate in Oracle