Check If the String Contains Accented Characters in SQL

Check If the string contains accented characters in SQL?

SQL Fiddle: http://sqlfiddle.com/#!6/9eecb7d/1607

declare @a nvarchar(32) = 'àéêöhello!'
declare @b nvarchar(32) = 'aeeohello!'

select case
when (cast(@a as varchar(32)) collate SQL_Latin1_General_Cp1251_CS_AS) = @a
then 0
else 1
end HasSpecialChars

select case
when (cast(@b as varchar(32)) collate SQL_Latin1_General_Cp1251_CS_AS) = @b
then 0
else 1
end HasSpecialChars

(based on solution here: How can I remove accents on a string?)

How to detect if a string contains special characters?

Assuming SQL Server:

e.g. if you class special characters as anything NOT alphanumeric:

DECLARE @MyString VARCHAR(100)
SET @MyString = 'adgkjb$'

IF (@MyString LIKE '%[^a-zA-Z0-9]%')
PRINT 'Contains "special" characters'
ELSE
PRINT 'Does not contain "special" characters'

Just add to other characters you don't class as special, inside the square brackets

Find the accent data in table records

You can use the REGEXP_LIKE function along with a list of all the accented characters you're interested in:

with t1(data) as (
select '2ème édition' from dual union all
select 'Natália' from dual union all
select 'sravanth' from dual
)
select * from t1 where regexp_like(data,'[àèìòùÀÈÌÒÙáéíóúýÁÉÍÓÚÝâêîôûÂÊÎÔÛãñõÃÑÕäëïöüÿÄËÏÖÜŸçÇßØøÅåÆæœ]');

DATA
--------------
2ème édition
Natália

Check if field contains special character in SQL

Had to solve this a little while ago myself:

Use regex:

LIKE '%[^a-zA-Z0-9]%'

To solve the problem of searching the tables, try: How do I find a value anywhere in a SQL Server Database? , if not I got a better one somewhere that I use.

Checking if a string is valid with special chars

Escaping is one method.

But if you just want to ignore the readable ASCII characters, then the range could be simplified.

[^ -~] : not between space and ~

-- Sample data
declare @T table (col NVARCHAR(30) collate SQL_Latin1_General_CP850_BIN primary key);
insert into @T (col) values
(N'abc╢123'),
(N'xyz123[}'''),
(N'abc௹123');

-- Query
SELECT col, PATINDEX(N'%[^ -~]%' collate SQL_Latin1_General_CP850_BIN, col) as pos
FROM @T;

Returns:

col         pos
-------- ----
abc╢123 4
abc௹123 4
xyz123[}' 0

But to also locate the caret and some others, it's more complicated.

Since PATINDEX doesn't have ESCAPE as LIKE does.

-- Sample data
declare @T table (
id int identity(1,1) primary key,
col NVARCHAR(30) collate SQL_Latin1_General_CP850_BIN
);
insert into @T (col) values
(N'xyz[123]}''') -- good
,(N'abc╢123') -- bad
,(N'abc௹123') -- bad
,(N'def#456') -- bad
,(N'def^456') -- bad
;

-- also locate #, ´ , ` and ^
SELECT col,
CASE
WHEN PATINDEX(N'%[^ !"$-_a-z{-~]%' collate SQL_Latin1_General_CP850_BIN, col) > 0
THEN PATINDEX(N'%[^ !"$-_a-z{-~]%' collate SQL_Latin1_General_CP850_BIN, col)
ELSE CHARINDEX(N'^' collate SQL_Latin1_General_CP850_BIN, col)
END AS pos
FROM @T;

Returns:

xyz[123]}'  0
abc╢123 4
abc௹123 4
def#456 4
def^456 4

sql search on fields containing diacritics


select * from books where title COLLATE Latin1_General_CI_AI like '%casa%'

Of course, you should choose a collation that matches yours... just change AS to AI on the end to make it "accent insensitive"

Quick example with German umlauts

DECLARE @foo TABLE (bar varchar(100) /*implied COLLATE Latin1_General_CI_AS*/)

INSERT @foo VALUES ('xxx fish yyy')
INSERT @foo VALUES ('xxx bar yyy' )
INSERT @foo VALUES ('xxx bär yyy')

select * from @foo where bar COLLATE Latin1_General_CI_AI like '%bar%'
select * from @foo where bar like '%bar%'

How to search in SQL Server for text that has special characters?

Assuming that by "special" characters you mean anything outside the set of printable ASCII and certain common whitespace characters , you can try the following:

DECLARE @SpecialPattern VARCHAR(100) =
'%[^'
+ CHAR(9) + CHAR(10) + CHAR(13) -- tab, CR, LF
+ CHAR(32) + '-' + CHAR(126) -- Range from space to last printable ASCII
+ ']%'

SELECT
RESUME_TEXT,
cast(left(cast(resume_text as varchar(max)),20) as varbinary(max))` -- Borrowed from userMT's comment
FROM RESUME
WHERE RESUME_TEXT LIKE @SpecialPattern COLLATE Latin1_General_Bin -- Use exact compare

You may get some false hits against some perfectly valid extended characters such as accented vowels, curly quotes, or m- and n- dashes that may exist in the text.

My first though is that the weird characters might be a UTF-8 BOM (hex EF, BB, BF), but the display didn't seem to match the how I would expect SQL Server to render them. The inverse dot isn't present at all in the default windows code page (1252).

We need at least some hex data (at least the first few bytes) to help further. Often, common binary file types have a recognizable signature in the first 3-5 bytes.



Related Topics



Leave a reply



Submit