How do I remove extended ASCII characters from a string in T-SQL?
OK, give this a try. It seems the same issue they have. Anyway you need to modify it based on your requirements.
CREATE FUNCTION RemoveNonASCII
(
@nstring nvarchar(255)
)
RETURNS varchar(255)
AS
BEGIN
DECLARE @Result varchar(255)
SET @Result = ''
DECLARE @nchar nvarchar(1)
DECLARE @position int
SET @position = 1
WHILE @position <= LEN(@nstring)
BEGIN
SET @nchar = SUBSTRING(@nstring, @position, 1)
--Unicode & ASCII are the same from 1 to 255.
--Only Unicode goes beyond 255
--0 to 31 are non-printable characters
IF UNICODE(@nchar) between 32 and 255
SET @Result = @Result + @nchar
SET @position = @position + 1
END
RETURN @Result
END
GO
Check it out at SqlServerCentral
T SQL Remove ASCII character from SELECT
DECLARE @str VARCHAR(20) = CONCAT('This is a tab--> ', '<--');
SELECT @str, REPLACE(@str, CHAR(9), '');
SELECT [E_Code]
FROM [Person] P
WHERE P.E_Code NOT IN (
SELECT REPLACE(dsc, CHAR(9), '')
FROM OPENQUERY(svr01, 'select "dsc" from TST.eth')
)
SQL Server - Remove all non-printable ASCII characters
Another Option.
This function will replace control characters and correct any residual repeating spaces. For example Jane Smith{13}was here
will not be returned as Jane Smithwas here
, but rather Jane Smith was here
CREATE FUNCTION [dbo].[udf-Str-Strip-Control](@S varchar(max))
Returns varchar(max)
Begin
;with cte1(N) As (Select 1 From (Values(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) N(N)),
cte2(C) As (Select Top (32) Char(Row_Number() over (Order By (Select NULL))-1) From cte1 a,cte1 b)
Select @S = Replace(@S,C,' ')
From cte2
Return ltrim(rtrim(replace(replace(replace(@S,' ','†‡'),'‡†',''),'†‡',' ')))
End
--Select [dbo].[udf-Str-Strip-Control]('Michael '+char(13)+char(10)+'LastName') --Returns: Michael LastName
Replacing non-ascii characters in MSSQL
You cannot do it any other way than the old-fashioned, "hard" way (in any language, even, not only SQL).
Since in many (spoken/written) languages, accented characters are not the same as non-accented ones, it's actually just a visual similarity, so there is no true correspondance. Some letters can look like just the same with a symbol added, but actually have a completely different "sound" and different rules (example).
Just build a table, or array, or whatever of the correspondances you know, or find, or can think of.
char | replacement
------------------
à | a
è | e
é | e
ñ | n
ç | c
ß | B
...
Then, loop on your data and replacement chars to replace all characters by their correspondance.
How to remove accents and all chars a..z in sql-server?
You can avoid hard-coded REPLACE
statements by using a COLLATE
clause with an accent-insensitive collation to compare the accented alphabetic characters to non-alphabetic ones:
DECLARE
@s1 NVARCHAR(200),
@s2 NVARCHAR(200)
SET @s1 = N'aèàç=.32s df'
SET @s2 = N''
SELECT @s2 = @s2 + no_accent
FROM (
SELECT
SUBSTRING(@s1, number, 1) AS accent,
number
FROM master.dbo.spt_values
WHERE TYPE = 'P'
AND number BETWEEN 1 AND LEN(@s1)
) s1
INNER JOIN (
SELECT NCHAR(number) AS no_accent
FROM master.dbo.spt_values
WHERE type = 'P'
AND (number BETWEEN 65 AND 90 OR number BETWEEN 97 AND 122)
) s2
ON s1.accent COLLATE LATIN1_GENERAL_CS_AI = s2.no_accent
ORDER BY number
SELECT @s1
SELECT @s2
/*
aèàç=.32s df
aeacsdf
*/
Replace Unicode characters in T-SQL
The þ character (Extended ASCII { via ISO-8859-1 and ANSI Code Page 1252 } & UNICODE value of 254) is known as "thorn" and in some languages equates directly to th:
Technical info on the character here: http://unicode-table.com/en/00FE/
Explanation of that character and collations here: http://userguide.icu-project.org/collation/customization. Search the page — typically
Control-F
— for "Complex Tailoring Examples" and you will see the following:The letter 'þ' (THORN) is normally treated by UCA/root collation as a separate letter that has primary-level sorting after 'z'. However, in Swedish and some other Scandinavian languages, 'þ' and 'Þ' should be treated as just a tertiary-level difference from the letters "th" and "TH" respectively.
If you do not want þ to equate to th, then force a Binary collation as follows:
SELECT REPLACE(N'this is the news with a þ' COLLATE Latin1_General_100_BIN2,
N'þ', N'__');
Returns:
this is the news with a __
For more info on working with Collations, Unicode, encodings, etc, please visit: Collations Info
Related Topics
Sql: How to Select Only Groups That Do Not Contain a Certain Value
How to Remove Extended Ascii Characters from a String in T-Sql
Is There a Good Reason to Use Upper Case for SQL Keywords
Joining Multiple Tables in SQL
How to Find Consecutive Rows Based on the Value of a Column
SQL Performance: Where VS Where(Row_Number)
Update a Column of a Table with a Column of Another Table in Postgresql
SQL Inner Join More Than Two Tables
Use a Union or a Join - What Is Faster
How to Get the Size of a Varchar[N] Field in One SQL Statement
How to Get Return Value of a Stored Procedure
Aggregate Function in an SQL Update Query
Sorting String Column Containing Numbers in SQL
Hamming Distance on Binary Strings in SQL
Advice Using Pivot Table in Oracle