Regex Pattern Inside SQL Replace Function

Regex pattern inside SQL Replace function?

You can use PATINDEX
to find the first index of the pattern (string's) occurrence. Then use STUFF to stuff another string into the pattern(string) matched.

Loop through each row. Replace each illegal characters with what you want. In your case replace non numeric with blank. The inner loop is if you have more than one illegal character in a current cell that of the loop.

DECLARE @counter int

SET @counter = 0

WHILE(@counter < (SELECT MAX(ID_COLUMN) FROM Table))
BEGIN  

    WHILE 1 = 1
    BEGIN
        DECLARE @RetVal varchar(50)

        SET @RetVal =  (SELECT Column = STUFF(Column, PATINDEX('%[^0-9.]%', Column),1, '')
        FROM Table
        WHERE ID_COLUMN = @counter)

        IF(@RetVal IS NOT NULL)       
          UPDATE Table SET
          Column = @RetVal
          WHERE ID_COLUMN = @counter
        ELSE
            break
    END

    SET @counter = @counter + 1
END

Caution: This is slow though! Having a varchar column may impact. So using LTRIM RTRIM may help a bit. Regardless, it is slow.

Credit goes to this StackOverFlow answer.

EDIT
Credit also goes to @srutzky

Edit (by @Tmdean)
Instead of doing one row at a time, this answer can be adapted to a more set-based solution. It still iterates the max of the number of non-numeric characters in a single row, so it's not ideal, but I think it should be acceptable in most situations.

WHILE 1 = 1 BEGIN
    WITH q AS
        (SELECT ID_Column, PATINDEX('%[^0-9.]%', Column) AS n
        FROM Table)
    UPDATE Table
    SET Column = STUFF(Column, q.n, 1, '')
    FROM q
    WHERE Table.ID_Column = q.ID_Column AND q.n != 0;

    IF @@ROWCOUNT = 0 BREAK;
END;

You can also improve efficiency quite a lot if you maintain a bit column in the table that indicates whether the field has been scrubbed yet. (NULL represents "Unknown" in my example and should be the column default.)

DECLARE @done bit = 0;
WHILE @done = 0 BEGIN
    WITH q AS
        (SELECT ID_Column, PATINDEX('%[^0-9.]%', Column) AS n
        FROM Table
        WHERE COALESCE(Scrubbed_Column, 0) = 0)
    UPDATE Table
    SET Column = STUFF(Column, q.n, 1, ''),
        Scrubbed_Column = 0
    FROM q
    WHERE Table.ID_Column = q.ID_Column AND q.n != 0;

    IF @@ROWCOUNT = 0 SET @done = 1;

    -- if Scrubbed_Column is still NULL, then the PATINDEX
    -- must have given 0
    UPDATE table
    SET Scrubbed_Column = CASE
        WHEN Scrubbed_Column IS NULL THEN 1
        ELSE NULLIF(Scrubbed_Column, 0)
    END;
END;

If you don't want to change your schema, this is easy to adapt to store intermediate results in a table valued variable which gets applied to the actual table at the end.

Replace string using RegEx?

It seems from the documentation https://docs.microsoft.com/en-us/sql/t-sql/functions/replace-transact-sql?view=sql-server-ver15 that replace is not to be used with regular expression.

string_pattern

Is the substring to be found. string_pattern can be of a character or binary data type. string_pattern cannot be an empty string (''), and must not exceed the maximum number of bytes that fits on a page.

Alternative solutions here: Regex pattern inside SQL Replace function?

A way to use regex within t-sql's replace function?

If you wont or maybe can't use a CLR due to security reasons, there is a simple way using a CTE inside standard t-sql.

Here is a complete example inclusive demo structure. You can run it on a whole table.

CREATE TABLE #dummyData(id int identity(1,1), teststring nvarchar(255))

INSERT INTO #dummyData(teststring)
VALUES(N'<B99_9>TEST</B99_9><LastDay>TEST</LastDay>, <B99_9>TEST</B99_9>, <B99_9></B99_9>')

DECLARE @starttag nvarchar(10) = N'<B99_9>', @endtag nvarchar(10) = N'</B99_9>'

;WITH cte AS(
    SELECT id, STUFF(
                teststring,
                PATINDEX(N'%'+@starttag+N'[a-z0-9]%',teststring)+LEN(@starttag),
                (PATINDEX(N'%[a-z0-9]'+@endtag+N'%',teststring)+1)-(PATINDEX(N'%'+@starttag+N'[a-z0-9]%',teststring)+LEN(@starttag)),
                N''
            ) as teststring, 1 as iteration
    FROM #dummyData
    -- iterate until everything is replaced
    UNION ALL
    SELECT id, STUFF(
                teststring,
                PATINDEX(N'%'+@starttag+N'[a-z0-9]%',teststring)+LEN(@starttag),
                (PATINDEX(N'%[a-z0-9]'+@endtag+N'%',teststring)+1)-(PATINDEX(N'%'+@starttag+N'[a-z0-9]%',teststring)+LEN(@starttag)),
                N''
            ) as teststring, iteration+1 as iteration
    FROM cte
    WHERE PATINDEX(N'%'+@starttag+N'[a-z0-9]%',teststring) > 0
)
SELECT c.id, c.teststring 
FROM cte as c
-- Join to get only the latest iteration
INNER JOIN (
            SELECT id, MAX(iteration) as maxIteration
            FROM cte 
            GROUP BY id
        ) as onlyMax
    ON c.id = onlyMax.id
    AND c.iteration = onlyMax.maxIteration

-- Cleanup
DROP TABLE #dummyData

If you want to use the result of the CTE in an update. You can just replace the part after the CTE-definition with the following code:

UPDATE dd
SET teststring = c.teststring
FROM #dummyData as dd -- rejoin the base table for later update usage
INNER JOIN cte as c
        ON dd.id = c.id
-- Join to get only the latest iteration
INNER JOIN (
            SELECT id, MAX(iteration) as maxIteration
            FROM cte 
            GROUP BY id
        ) as onlyMax
    ON c.id = onlyMax.id
    AND c.iteration = onlyMax.maxIteration

If you don't want to run it on a complete table set, you can run the following code for a single variable:

DECLARE @string nvarchar(max) = N'<B99_9>TEST</B99_9><LastDay>TEST</LastDay>, <B99_9>TEST</B99_9>, <B99_9></B99_9>'
DECLARE @starttag nvarchar(10) = N'<B99_9>', @endtag nvarchar(10) = N'</B99_9>'

WHILE PATINDEX(N'%'+@starttag+N'[a-z0-9]%',@string) > 0 BEGIN
    SELECT @string = STUFF(
                    @string,
                    PATINDEX(N'%'+@starttag+N'[a-z0-9]%',@string)+LEN(@starttag),
                    (PATINDEX(N'%[a-z0-9]'+@endtag+N'%',@string)+1)-(PATINDEX(N'%'+@starttag+N'[a-z0-9]%',@string)+LEN(@starttag)),
                    N''
                )
END

SELECT @string

SQL Server 2014 replace with regex

First of all you need this user defined function to search for replacing a pattern with string:

CREATE FUNCTION dbo.PatternReplace
(
   @InputString VARCHAR(4000),
   @Pattern VARCHAR(100),
   @ReplaceText VARCHAR(4000)
)
RETURNS VARCHAR(4000)
AS
BEGIN
   DECLARE @Result VARCHAR(4000) SET @Result = ''
   -- First character in a match
   DECLARE @First INT
    -- Next character to start search on
    DECLARE @Next INT SET @Next = 1
    -- Length of the total string -- 8001 if @InputString is NULL
    DECLARE @Len INT SET @Len = COALESCE(LEN(@InputString), 8001)
    -- End of a pattern
    DECLARE @EndPattern INT

     WHILE (@Next <= @Len) 
     BEGIN
     SET @First = PATINDEX('%' + @Pattern + '%', SUBSTRING(@InputString, @Next, @Len))
      IF COALESCE(@First, 0) = 0 --no match - return
       BEGIN
          SET @Result = @Result + 
             CASE --return NULL, just like REPLACE, if inputs are NULL
                WHEN  @InputString IS NULL
                 OR @Pattern IS NULL
                 OR @ReplaceText IS NULL THEN NULL
           ELSE SUBSTRING(@InputString, @Next, @Len)
        END
     BREAK
  END
  ELSE
  BEGIN
     -- Concatenate characters before the match to the result
     SET @Result = @Result + SUBSTRING(@InputString, @Next, @First - 1)
     SET @Next = @Next + @First - 1

     SET @EndPattern = 1
     -- Find start of end pattern range
     WHILE PATINDEX(@Pattern, SUBSTRING(@InputString, @Next, @EndPattern)) = 0
        SET @EndPattern = @EndPattern + 1
     -- Find end of pattern range
     WHILE PATINDEX(@Pattern, SUBSTRING(@InputString, @Next, @EndPattern)) > 0
           AND @Len >= (@Next + @EndPattern - 1)
        SET @EndPattern = @EndPattern + 1

     --Either at the end of the pattern or @Next + @EndPattern = @Len
     SET @Result = @Result + @ReplaceText
     SET @Next = @Next + @EndPattern - 1
  END
      END
      RETURN(@Result)
   END

SQL remove characters that aren't in a regex pattern

This:

PatIndex('%[^a-zA-Z0-9+&@#\/%=~_|$?!-:,.']%', YourValue)

will return the character at which the pattern matches. In this case, I've added ^ to the beginning so that the pattern matches everything not in the character set.

You can then remove the character at that position, and continue, or replace all occurrences of the found character in the entire string.

FYI: to simulate the offset parameter of CharIndex in order to search starting at a certain character position, you can use Substring to get a portion of the string (or even one character) and use PatIndex on that.

Regex Replace function: in cases of no match, $1 returns full line instead of null

You may add an alternative that matches any string,

.*(\d{13}).*|.*

The point is that the first alternative is tried first, and if there are 13 consecutive digits on a line, the alternative will "win" and .* won't trigger. $1 will hold the 13 digits then. See the regex demo.

Alternatively, an optional non-capturing group with the obligatory digit capturing group:

(?:.*(\d{13}))?.*

See the regex demo

Here, (?:.*(\d{13}))? will be executed at least once (as ? is a greedy quantifier matching 1 or 0 times) and will find 13 digits and place them into Group 1 after any 0+ chars other than linebreak chars. The .* at the end of the pattern will match the rest of the line.

Regex Pattern Inside SQL Replace Function