Regex Pattern Inside SQL Replace Function

Regex pattern inside SQL Replace function?

You can use PATINDEX
to find the first index of the pattern (string's) occurrence. Then use STUFF to stuff another string into the pattern(string) matched.

Loop through each row. Replace each illegal characters with what you want. In your case replace non numeric with blank. The inner loop is if you have more than one illegal character in a current cell that of the loop.

DECLARE @counter int

SET @counter = 0

WHILE(@counter < (SELECT MAX(ID_COLUMN) FROM Table))
BEGIN

WHILE 1 = 1
BEGIN
DECLARE @RetVal varchar(50)

SET @RetVal = (SELECT Column = STUFF(Column, PATINDEX('%[^0-9.]%', Column),1, '')
FROM Table
WHERE ID_COLUMN = @counter)

IF(@RetVal IS NOT NULL)
UPDATE Table SET
Column = @RetVal
WHERE ID_COLUMN = @counter
ELSE
break
END

SET @counter = @counter + 1
END

Caution: This is slow though! Having a varchar column may impact. So using LTRIM RTRIM may help a bit. Regardless, it is slow.

Credit goes to this StackOverFlow answer.

EDIT
Credit also goes to @srutzky

Edit (by @Tmdean)
Instead of doing one row at a time, this answer can be adapted to a more set-based solution. It still iterates the max of the number of non-numeric characters in a single row, so it's not ideal, but I think it should be acceptable in most situations.

WHILE 1 = 1 BEGIN
WITH q AS
(SELECT ID_Column, PATINDEX('%[^0-9.]%', Column) AS n
FROM Table)
UPDATE Table
SET Column = STUFF(Column, q.n, 1, '')
FROM q
WHERE Table.ID_Column = q.ID_Column AND q.n != 0;

IF @@ROWCOUNT = 0 BREAK;
END;

You can also improve efficiency quite a lot if you maintain a bit column in the table that indicates whether the field has been scrubbed yet. (NULL represents "Unknown" in my example and should be the column default.)

DECLARE @done bit = 0;
WHILE @done = 0 BEGIN
WITH q AS
(SELECT ID_Column, PATINDEX('%[^0-9.]%', Column) AS n
FROM Table
WHERE COALESCE(Scrubbed_Column, 0) = 0)
UPDATE Table
SET Column = STUFF(Column, q.n, 1, ''),
Scrubbed_Column = 0
FROM q
WHERE Table.ID_Column = q.ID_Column AND q.n != 0;

IF @@ROWCOUNT = 0 SET @done = 1;

-- if Scrubbed_Column is still NULL, then the PATINDEX
-- must have given 0
UPDATE table
SET Scrubbed_Column = CASE
WHEN Scrubbed_Column IS NULL THEN 1
ELSE NULLIF(Scrubbed_Column, 0)
END;
END;

If you don't want to change your schema, this is easy to adapt to store intermediate results in a table valued variable which gets applied to the actual table at the end.

Replace string using RegEx?

It seems from the documentation https://docs.microsoft.com/en-us/sql/t-sql/functions/replace-transact-sql?view=sql-server-ver15 that replace is not to be used with regular expression.

string_pattern

Is the substring to be found. string_pattern can be of a character or binary data type. string_pattern cannot be an empty string (''), and must not exceed the maximum number of bytes that fits on a page.

Alternative solutions here: Regex pattern inside SQL Replace function?

A way to use regex within t-sql's replace function?

If you wont or maybe can't use a CLR due to security reasons, there is a simple way using a CTE inside standard t-sql.

Here is a complete example inclusive demo structure. You can run it on a whole table.

CREATE TABLE #dummyData(id int identity(1,1), teststring nvarchar(255))

INSERT INTO #dummyData(teststring)
VALUES(N'<B99_9>TEST</B99_9><LastDay>TEST</LastDay>, <B99_9>TEST</B99_9>, <B99_9></B99_9>')

DECLARE @starttag nvarchar(10) = N'<B99_9>', @endtag nvarchar(10) = N'</B99_9>'

;WITH cte AS(
SELECT id, STUFF(
teststring,
PATINDEX(N'%'+@starttag+N'[a-z0-9]%',teststring)+LEN(@starttag),
(PATINDEX(N'%[a-z0-9]'+@endtag+N'%',teststring)+1)-(PATINDEX(N'%'+@starttag+N'[a-z0-9]%',teststring)+LEN(@starttag)),
N''
) as teststring, 1 as iteration
FROM #dummyData
-- iterate until everything is replaced
UNION ALL
SELECT id, STUFF(
teststring,
PATINDEX(N'%'+@starttag+N'[a-z0-9]%',teststring)+LEN(@starttag),
(PATINDEX(N'%[a-z0-9]'+@endtag+N'%',teststring)+1)-(PATINDEX(N'%'+@starttag+N'[a-z0-9]%',teststring)+LEN(@starttag)),
N''
) as teststring, iteration+1 as iteration
FROM cte
WHERE PATINDEX(N'%'+@starttag+N'[a-z0-9]%',teststring) > 0
)
SELECT c.id, c.teststring
FROM cte as c
-- Join to get only the latest iteration
INNER JOIN (
SELECT id, MAX(iteration) as maxIteration
FROM cte
GROUP BY id
) as onlyMax
ON c.id = onlyMax.id
AND c.iteration = onlyMax.maxIteration

-- Cleanup
DROP TABLE #dummyData

If you want to use the result of the CTE in an update. You can just replace the part after the CTE-definition with the following code:

UPDATE dd
SET teststring = c.teststring
FROM #dummyData as dd -- rejoin the base table for later update usage
INNER JOIN cte as c
ON dd.id = c.id
-- Join to get only the latest iteration
INNER JOIN (
SELECT id, MAX(iteration) as maxIteration
FROM cte
GROUP BY id
) as onlyMax
ON c.id = onlyMax.id
AND c.iteration = onlyMax.maxIteration

If you don't want to run it on a complete table set, you can run the following code for a single variable:

DECLARE @string nvarchar(max) = N'<B99_9>TEST</B99_9><LastDay>TEST</LastDay>, <B99_9>TEST</B99_9>, <B99_9></B99_9>'
DECLARE @starttag nvarchar(10) = N'<B99_9>', @endtag nvarchar(10) = N'</B99_9>'

WHILE PATINDEX(N'%'+@starttag+N'[a-z0-9]%',@string) > 0 BEGIN
SELECT @string = STUFF(
@string,
PATINDEX(N'%'+@starttag+N'[a-z0-9]%',@string)+LEN(@starttag),
(PATINDEX(N'%[a-z0-9]'+@endtag+N'%',@string)+1)-(PATINDEX(N'%'+@starttag+N'[a-z0-9]%',@string)+LEN(@starttag)),
N''
)
END

SELECT @string

SQL Server 2014 replace with regex

First of all you need this user defined function to search for replacing a pattern with string:

CREATE FUNCTION dbo.PatternReplace
(
@InputString VARCHAR(4000),
@Pattern VARCHAR(100),
@ReplaceText VARCHAR(4000)
)
RETURNS VARCHAR(4000)
AS
BEGIN
DECLARE @Result VARCHAR(4000) SET @Result = ''
-- First character in a match
DECLARE @First INT
-- Next character to start search on
DECLARE @Next INT SET @Next = 1
-- Length of the total string -- 8001 if @InputString is NULL
DECLARE @Len INT SET @Len = COALESCE(LEN(@InputString), 8001)
-- End of a pattern
DECLARE @EndPattern INT

WHILE (@Next <= @Len)
BEGIN
SET @First = PATINDEX('%' + @Pattern + '%', SUBSTRING(@InputString, @Next, @Len))
IF COALESCE(@First, 0) = 0 --no match - return
BEGIN
SET @Result = @Result +
CASE --return NULL, just like REPLACE, if inputs are NULL
WHEN @InputString IS NULL
OR @Pattern IS NULL
OR @ReplaceText IS NULL THEN NULL
ELSE SUBSTRING(@InputString, @Next, @Len)
END
BREAK
END
ELSE
BEGIN
-- Concatenate characters before the match to the result
SET @Result = @Result + SUBSTRING(@InputString, @Next, @First - 1)
SET @Next = @Next + @First - 1

SET @EndPattern = 1
-- Find start of end pattern range
WHILE PATINDEX(@Pattern, SUBSTRING(@InputString, @Next, @EndPattern)) = 0
SET @EndPattern = @EndPattern + 1
-- Find end of pattern range
WHILE PATINDEX(@Pattern, SUBSTRING(@InputString, @Next, @EndPattern)) > 0
AND @Len >= (@Next + @EndPattern - 1)
SET @EndPattern = @EndPattern + 1

--Either at the end of the pattern or @Next + @EndPattern = @Len
SET @Result = @Result + @ReplaceText
SET @Next = @Next + @EndPattern - 1
END
END
RETURN(@Result)
END

Read more here

After creating this function you can try this:

DECLARE @x VARCHAR(max)
SET @x = 'sample text min(my value) continue with sample text with ) and ('

DECLARE @val VARCHAR(max)

SET @val = SUBSTRING(SUBSTRING(@x,CHARINDEX('min(',@x)+4,LEN(@x)-CHARINDEX('min(',@x)),1,CHARINDEX(')',@x)-(CHARINDEX('min(',@x)+4))

SELECT REPLACE(dbo.PatternReplace(@x,'%min(','min(max('),'min(max('+@val+')','min(max('+@val+'))')

And you can see that the output is:

enter image description here

Occurrence of min(xxx) more than once:

enter image description here

Finally you can simply update your table as below:

UPDATE  YourTable
SET YourColumn = REPLACE(dbo.PatternReplace(YourColumn, '%min(', 'min(max('),
'min(max(' + SUBSTRING(SUBSTRING(YourColumn,CHARINDEX('min(', YourColumn)+ 4,LEN(YourColumn)- CHARINDEX('min(',YourColumn)), 1,CHARINDEX(')', YourColumn)- ( CHARINDEX('min(', YourColumn) + 4 ))+ ')',
'min(max(' + SUBSTRING(SUBSTRING(YourColumn,CHARINDEX('min(', YourColumn)+ 4,LEN(YourColumn)- CHARINDEX('min(',YourColumn)), 1,CHARINDEX(')', YourColumn)- ( CHARINDEX('min(', YourColumn) + 4 ))+ '))');

SQL remove characters that aren't in a regex pattern

This:

PatIndex('%[^a-zA-Z0-9+&@#\/%=~_|$?!-:,.']%', YourValue)

will return the character at which the pattern matches. In this case, I've added ^ to the beginning so that the pattern matches everything not in the character set.

You can then remove the character at that position, and continue, or replace all occurrences of the found character in the entire string.

FYI: to simulate the offset parameter of CharIndex in order to search starting at a certain character position, you can use Substring to get a portion of the string (or even one character) and use PatIndex on that.

Regex Replace function: in cases of no match, $1 returns full line instead of null

You may add an alternative that matches any string,

.*(\d{13}).*|.*

The point is that the first alternative is tried first, and if there are 13 consecutive digits on a line, the alternative will "win" and .* won't trigger. $1 will hold the 13 digits then. See the regex demo.

Alternatively, an optional non-capturing group with the obligatory digit capturing group:

(?:.*(\d{13}))?.*

See the regex demo

Here, (?:.*(\d{13}))? will be executed at least once (as ? is a greedy quantifier matching 1 or 0 times) and will find 13 digits and place them into Group 1 after any 0+ chars other than linebreak chars. The .* at the end of the pattern will match the rest of the line.



Related Topics



Leave a reply



Submit