How to Remove Extended Ascii Characters from a String in T-Sql

How do I remove extended ASCII characters from a string in T-SQL?

OK, give this a try. It seems the same issue they have. Anyway you need to modify it based on your requirements.

CREATE FUNCTION RemoveNonASCII 
(
@nstring nvarchar(255)
)
RETURNS varchar(255)
AS
BEGIN

DECLARE @Result varchar(255)
SET @Result = ''

DECLARE @nchar nvarchar(1)
DECLARE @position int

SET @position = 1
WHILE @position <= LEN(@nstring)
BEGIN
SET @nchar = SUBSTRING(@nstring, @position, 1)
--Unicode & ASCII are the same from 1 to 255.
--Only Unicode goes beyond 255
--0 to 31 are non-printable characters
IF UNICODE(@nchar) between 32 and 255
SET @Result = @Result + @nchar
SET @position = @position + 1
END

RETURN @Result

END
GO

Check it out at SqlServerCentral

T SQL Remove ASCII character from SELECT


DECLARE @str VARCHAR(20) = CONCAT('This is a tab--> ', '<--');
SELECT @str, REPLACE(@str, CHAR(9), '');


SELECT [E_Code]
FROM [Person] P
WHERE P.E_Code NOT IN (
SELECT REPLACE(dsc, CHAR(9), '')
FROM OPENQUERY(svr01, 'select "dsc" from TST.eth')
)

SQL Server - Remove all non-printable ASCII characters

Another Option.

This function will replace control characters and correct any residual repeating spaces. For example Jane Smith{13}was here will not be returned as Jane Smithwas here, but rather Jane Smith was here

CREATE FUNCTION [dbo].[udf-Str-Strip-Control](@S varchar(max))
Returns varchar(max)
Begin
;with cte1(N) As (Select 1 From (Values(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) N(N)),
cte2(C) As (Select Top (32) Char(Row_Number() over (Order By (Select NULL))-1) From cte1 a,cte1 b)
Select @S = Replace(@S,C,' ')
From cte2

Return ltrim(rtrim(replace(replace(replace(@S,' ','†‡'),'‡†',''),'†‡',' ')))
End
--Select [dbo].[udf-Str-Strip-Control]('Michael '+char(13)+char(10)+'LastName') --Returns: Michael LastName

Replacing non-ascii characters in MSSQL

You cannot do it any other way than the old-fashioned, "hard" way (in any language, even, not only SQL).

Since in many (spoken/written) languages, accented characters are not the same as non-accented ones, it's actually just a visual similarity, so there is no true correspondance. Some letters can look like just the same with a symbol added, but actually have a completely different "sound" and different rules (example).

Just build a table, or array, or whatever of the correspondances you know, or find, or can think of.


char | replacement
------------------
à | a
è | e
é | e
ñ | n
ç | c
ß | B
...

Then, loop on your data and replacement chars to replace all characters by their correspondance.

How to remove accents and all chars a..z in sql-server?

You can avoid hard-coded REPLACE statements by using a COLLATE clause with an accent-insensitive collation to compare the accented alphabetic characters to non-alphabetic ones:

DECLARE 
@s1 NVARCHAR(200),
@s2 NVARCHAR(200)

SET @s1 = N'aèàç=.32s df'

SET @s2 = N''
SELECT @s2 = @s2 + no_accent
FROM (
SELECT
SUBSTRING(@s1, number, 1) AS accent,
number
FROM master.dbo.spt_values
WHERE TYPE = 'P'
AND number BETWEEN 1 AND LEN(@s1)
) s1
INNER JOIN (
SELECT NCHAR(number) AS no_accent
FROM master.dbo.spt_values
WHERE type = 'P'
AND (number BETWEEN 65 AND 90 OR number BETWEEN 97 AND 122)
) s2
ON s1.accent COLLATE LATIN1_GENERAL_CS_AI = s2.no_accent
ORDER BY number

SELECT @s1
SELECT @s2

/*
aèàç=.32s df
aeacsdf
*/

Replace Unicode characters in T-SQL

The þ character (Extended ASCII { via ISO-8859-1 and ANSI Code Page 1252 } & UNICODE value of 254) is known as "thorn" and in some languages equates directly to th:

  • Technical info on the character here: http://unicode-table.com/en/00FE/

  • Explanation of that character and collations here: http://userguide.icu-project.org/collation/customization. Search the page — typically Control-F — for "Complex Tailoring Examples" and you will see the following:

    The letter 'þ' (THORN) is normally treated by UCA/root collation as a separate letter that has primary-level sorting after 'z'. However, in Swedish and some other Scandinavian languages, 'þ' and 'Þ' should be treated as just a tertiary-level difference from the letters "th" and "TH" respectively.

If you do not want þ to equate to th, then force a Binary collation as follows:

SELECT REPLACE(N'this is the news with a þ' COLLATE Latin1_General_100_BIN2,
N'þ', N'__');

Returns:

this is the news with a __

For more info on working with Collations, Unicode, encodings, etc, please visit: Collations Info



Related Topics



Leave a reply



Submit