Performance of Regexp_Replace VS Translate in Oracle

Performance of regexp_replace vs translate in Oracle?

I think you're running into simple optimization. The regexp expression is so expensive to compute that the result is cached in the hope that it will be used again in the future. If you actually use distinct strings to convert, you will see that the modest translate is naturally faster because it is its specialized function.

Here's my example, running on 11.1.0.7.0:

SQL> DECLARE
2 TYPE t IS TABLE OF VARCHAR2(4000);
3 l t;
4 l_level NUMBER := 1000;
5 l_time TIMESTAMP;
6 l_char VARCHAR2(4000);
7 BEGIN
8 -- init
9 EXECUTE IMMEDIATE 'ALTER SESSION SET PLSQL_OPTIMIZE_LEVEL=2';
10 SELECT dbms_random.STRING('p', 2000)
11 BULK COLLECT
12 INTO l FROM dual
13 CONNECT BY LEVEL <= l_level;
14 -- regex
15 l_time := systimestamp;
16 FOR i IN 1 .. l.count LOOP
17 l_char := regexp_replace(l(i), '[]()[]', '-', 1, 0);
18 END LOOP;
19 dbms_output.put_line('regex :' || (systimestamp - l_time));
20 -- tranlate
21 l_time := systimestamp;
22 FOR i IN 1 .. l.count LOOP
23 l_char := translate(l(i), '()[]', '----');
24 END LOOP;
25 dbms_output.put_line('translate :' || (systimestamp - l_time));
26 END;
27 /

regex :+000000000 00:00:00.979305000
translate :+000000000 00:00:00.238773000

PL/SQL procedure successfully completed

on 11.2.0.3.0 :

regex     :+000000000 00:00:00.617290000
translate :+000000000 00:00:00.138205000

Conclusion: In general I suspect translate will win.

Oracle Regexp_replace multiple occurrence

Try this:

select (case when substr(val, 1, 1) between '0' and '9' then 'C' else '' end) ||
regexp_replace(val, '([-+.,;:'"!])', '_')

Performing multiple Replace operations on an Oracle database column

You can nest the replace() calls:

UPDATE TABLE1
SET COLUMN1 = REPLACE(REPLACE(REPLACE(REPLACE(COLUMN1, 'ABC', 'SR1'), '123', 'SR2'), 'XYZ', 'SR3'), '789', 'SR4');

Oracle also offers regexp_replace(). You might find this handy for some of your data transformations.

In Oracle what is the fastest way to limit the characters in a string?

You can try this approach, which looks to be much faster based on some (very) loose tests. Its a function that's compiled natively:

CREATE OR REPLACE function clean_string(
in_string in varchar2)
return varchar2 AS
out_string varchar2(4000) := '';
in_length number;
cnt number := 0;
in_char char(1);
out_char char(1);
dec_char number;
prev_space boolean := false;
begin
--dbms_output.put_line('In string: ' || in_string);
in_length := LENGTH(in_string);
while cnt < in_length
LOOP
cnt := cnt + 1;
in_char := substr(in_string, cnt, 1);
dec_char := ascii(in_char);
-- blank out non alphanumerics
IF (
(dec_char >= 48 AND dec_char <= 57) OR
(dec_char >= 65 AND dec_char <= 90) OR
(dec_char >= 97 AND dec_char <= 122)
) THEN
--keep it
out_char := in_char;
ELSE
out_char := ' ';
END IF;

IF (NOT(prev_space AND out_char = ' ')) THEN
out_string := out_string || out_char;
END IF;

<<endloop>>
IF (out_char = ' ') THEN
prev_space := true;
ELSE
prev_space := false;
END IF;

END LOOP;
return trim(upper(out_string));
end;

ALTER SESSION SET PLSQL_CODE_TYPE=NATIVE;
ALTER function clean_string COMPILE;

And to test, I pulled 5 million rows from a table and cleaned some strings:

set serveroutput on
declare
cursor sel_cur1 is
select name, clean_string(name) as cln_name,
address1, clean_string(address1) as cln_addr1,
address2, clean_string(address2) as cln_addr2,
city, clean_string(city) as cln_city,
state, clean_string(state) as cln_state,
postalcode, clean_string(postalcode) as cln_zip
from my_table
where rownum <= 5000000;

cursor sel_cur2 is
select name,
address1,
address2,
city,
state,
postalcode
from my_table
where rownum <= 5000000;

l_cnt integer := 0;
l_cln_name varchar2(100);
l_cln_addr1 varchar2(100);
l_cln_addr2 varchar2(100);
l_cln_city varchar2(100);
l_cln_state varchar2(100);
l_cln_zip varchar2(100);

l_interval interval day to second(4);
l_start timestamp;
l_end timestamp;
begin
l_start := systimestamp;
for rec in sel_cur2
loop
l_cnt := l_cnt + 1;
l_cln_name := clean_string(rec.name);
l_cln_addr1 := clean_string(rec.address1);
l_cln_addr2 := clean_string(rec.address2);
l_cln_city := clean_string(rec.city);
l_cln_state := clean_string(rec.state);
l_cln_zip := clean_string(rec.postalcode);
end loop;
l_end := systimestamp;
l_interval := l_end - l_start;
dbms_output.put_line('Procedural approach timing: ' || l_interval);
-------------------------------------------------
l_cnt := 0;
l_start := systimestamp;
for rec in sel_cur1
loop
-- cleaning already done in SQL
l_cnt := l_cnt + 1;
end loop;
l_end := systimestamp;
l_interval := l_end - l_start;
dbms_output.put_line('SQL approach timing: ' || l_interval);

-------------------------------------------------
l_cnt := 0;
l_start := systimestamp;
for rec in sel_cur2
loop
l_cnt := l_cnt + 1;
l_cln_name := UPPER(TRIM(REGEXP_REPLACE(REGEXP_REPLACE(rec.name, '[^0-9A-Za-z ]', ' '),'( )* ',' ')));
l_cln_addr1 := UPPER(TRIM(REGEXP_REPLACE(REGEXP_REPLACE(rec.address1, '[^0-9A-Za-z ]', ' '),'( )* ',' ')));
l_cln_addr2 := UPPER(TRIM(REGEXP_REPLACE(REGEXP_REPLACE(rec.address2, '[^0-9A-Za-z ]', ' '),'( )* ',' ')));
l_cln_city := UPPER(TRIM(REGEXP_REPLACE(REGEXP_REPLACE(rec.city, '[^0-9A-Za-z ]', ' '),'( )* ',' ')));
l_cln_state := UPPER(TRIM(REGEXP_REPLACE(REGEXP_REPLACE(rec.state, '[^0-9A-Za-z ]', ' '),'( )* ',' ')));
l_cln_zip := UPPER(TRIM(REGEXP_REPLACE(REGEXP_REPLACE(rec.postalcode, '[^0-9A-Za-z ]', ' '),'( )* ',' ')));
end loop;
l_end := systimestamp;
l_interval := l_end - l_start;
dbms_output.put_line('Existing approach timing: ' || l_interval);
end;

And the output was:

Procedural approach timing: +00 00:02:04.0320
SQL approach timing: +00 00:02:49.4326
Existing approach timing: +00 00:05:50.1607

Also, the native compilation seems to only help a procedural approach to the processing (rather than calling the function from a SQL query), but appears to be much faster than the regexp_replace solution. Hope that helps.

Multiple REPLACE function in Oracle

Even if this thread is old is the first on Google, so I'll post an Oracle equivalent to the function implemented here, using regular expressions.

Is fairly faster than nested replace(), and much cleaner.

To replace strings 'a','b','c' with 'd' in a string column from a given table

select regexp_replace(string_col,'a|b|c','d') from given_table

It is nothing else than a regular expression for several static patterns with 'or' operator.

Beware of regexp special characters!

Replace characters after a defined pattern with regexp_replace in oracle

Here's a solution using regexp_replace. The 4th argument is the starting position. After some thought, I decided not to skip the '+2'. Don't be lazy and waste cycles testing characters you know are not the target character.

SQL> with tbl(str) as (
select 'Heyho || HeyheyHo' from dual
)
select str before,
regexp_replace(str, 'y', 'i', instr(str, '||')+2) after
from tbl;

BEFORE AFTER
----------------- -----------------
Heyho || HeyheyHo Heyho || HeiheiHo

SQL>


Related Topics



Leave a reply



Submit