How to Find Unicode/Non-Ascii Characters in an Ntext Field in a SQL Server 2005 Table

Find non-ASCII characters in varchar columns using SQL Server

try something like this:

DECLARE @YourTable table (PK int, col1 varchar(20), col2 varchar(20), col3 varchar(20));
INSERT @YourTable VALUES (1, 'ok','ok','ok');
INSERT @YourTable VALUES (2, 'BA'+char(182)+'D','ok','ok');
INSERT @YourTable VALUES (3, 'ok',char(182)+'BAD','ok');
INSERT @YourTable VALUES (4, 'ok','ok','B'+char(182)+'AD');
INSERT @YourTable VALUES (5, char(182)+'BAD','ok',char(182)+'BAD');
INSERT @YourTable VALUES (6, 'BAD'+char(182),'B'+char(182)+'AD','BAD'+char(182)+char(182)+char(182));

--if you have a Numbers table use that, other wise make one using a CTE
WITH AllNumbers AS
( SELECT 1 AS Number
UNION ALL
SELECT Number+1
FROM AllNumbers
WHERE Number<1000
)
SELECT
pk, 'Col1' BadValueColumn, CONVERT(varchar(20),col1) AS BadValue --make the XYZ in convert(varchar(XYZ), ...) the largest value of col1, col2, col3
FROM @YourTable y
INNER JOIN AllNumbers n ON n.Number <= LEN(y.col1)
WHERE ASCII(SUBSTRING(y.col1, n.Number, 1))<32 OR ASCII(SUBSTRING(y.col1, n.Number, 1))>127
UNION
SELECT
pk, 'Col2' BadValueColumn, CONVERT(varchar(20),col2) AS BadValue --make the XYZ in convert(varchar(XYZ), ...) the largest value of col1, col2, col3
FROM @YourTable y
INNER JOIN AllNumbers n ON n.Number <= LEN(y.col2)
WHERE ASCII(SUBSTRING(y.col2, n.Number, 1))<32 OR ASCII(SUBSTRING(y.col2, n.Number, 1))>127
UNION
SELECT
pk, 'Col3' BadValueColumn, CONVERT(varchar(20),col3) AS BadValue --make the XYZ in convert(varchar(XYZ), ...) the largest value of col1, col2, col3
FROM @YourTable y
INNER JOIN AllNumbers n ON n.Number <= LEN(y.col3)
WHERE ASCII(SUBSTRING(y.col3, n.Number, 1))<32 OR ASCII(SUBSTRING(y.col3, n.Number, 1))>127
order by 1
OPTION (MAXRECURSION 1000);

OUTPUT:

pk          BadValueColumn BadValue
----------- -------------- --------------------
2 Col1 BA¶D
3 Col2 ¶BAD
4 Col3 B¶AD
5 Col1 ¶BAD
5 Col3 ¶BAD
6 Col1 BAD¶
6 Col2 B¶AD
6 Col3 BAD¶¶¶

(8 row(s) affected)

Check if record has ascii char

Although it was not an answer but a link to another answer it solved my problem.
Simple e.g. how I wanted use it on Northwind database:

SELECT CompanyName      
FROM Customers
WHERE CompanyName not LIKE '%[' + CHAR(127)+ '-' +CHAR(255)+']%' COLLATE Latin1_General_100_BIN2

Results of this query are all companies that have only ascii characters in their names.

How to find invalid Char in a SQL table

I have found a solution in another users question here

I modified it slightly though. What ends up working for me is this:

ALTER FUNCTION [dbo].[RemoveNonASCII] 
(
-- Parameters
@nstring nvarchar(max)
)
RETURNS varchar(max)
AS
BEGIN
-- Variables
DECLARE @Result varchar(max) = '',@nchar nvarchar(1), @position int

-- T-SQL statements to compute the return value
set @position = 1
while @position <= LEN(@nstring)
BEGIN
set @nchar = SUBSTRING(@nstring, @position, 1)
if UNICODE(@nchar) between 32 and 127
set @Result = @Result + @nchar
set @position = @position + 1
set @Result = REPLACE(@Result,'))','')
set @Result = REPLACE(@Result,'?','')
END

-- Return the result
RETURN @Result

END


Related Topics



Leave a reply



Submit