Handling Unicode Characters That Aren't Displayed Correctly in SQL Query

Handling unicode characters that aren't displayed correctly in SQL query

NVarchar variable are denoted by N'<Value> so it would be

DECLARE @objname nvarchar(255)
set @objname=N'漢字'
select @objname

Now the output will be 漢字 as it has been set. Run above code.

Storing special unicode characters in SQL Server and retrieving in .NET

You need to use N prefix to indicate a unicode value going in

create table TestUni
(
UniCol nvarchar(20)
)

insert into TestUni values ('')
insert into TestUni values ('電')
insert into TestUni values (N'')
insert into TestUni values (N'電')

select len(UniCol), UniCol from TestUni

sql server not handling non-ascii characters

You've made a tiny mistake, you've missed the N prefix to show that your string is Unicode (National language character set):

DECLARE @X nvarchar(max) = N'Activbasē ਪਾਕਿਸਤਾਨ ਪੰਜਾਬ ਦਾ ਦਾਰੁਲ'
PRINT @X
SELECT @X AS X

Output:

Activbasē ਪਾਕਿਸਤਾਨ ਪੰਜਾਬ ਦਾ ਦਾਰੁਲ

Cannot store particular Unicode code points / characters in NVARCHAR fields

I can't reproduce any data loss or encoding issue. I can reproduce a squares that becomes /code> when copied. It's probably caused by the font used to display results in the SSMS grid or the Visual Studio debugger windows.

SQL Server and Windows use UTF16 for some time now, not UCS-2. Few fonts support the full UTF16 range though.

When I tried this in SSMS :

create table #tc(name nvarchar(20));
insert into #tc values (N'');

select name,len(name),DATALENGTH(name) from #tc;

I saw a square, 2 and 4 in the grid. This means the character was stored properly and took 4 bytes. When I tried to copy those results to SO though I saw :

name    (No column name)    (No column name)
2 4

When I used Result to Text I got the actual character :

name                             
-------------------- ----------- -----------
2 4

The correct character is there but the SSMS grid's font can't display it

Update

As Dan Guzman noted,the font can be changed from Tools-->Options-->Environment-->Fonts and Colors-->Show settings for:-->Grid Results. The default font is Microsoft Sans Serif, a small font (855KB) used as the default font on Windows. It contains "only" 3000 glyphs. Chinese characters aren't included, which is why squares are displayed.

Chinese computers use SimShun as the default though, whose file is 17.1MB. They wouldn't have any problem displaying chinese characters.

Special characters displaying incorrectly after BULK INSERT

You need to BULK INSERT using the CODEPAGE = 'ACP', which converts string data from Windows codepage 1252 to SQL Server codepage.

BULK INSERT dbo.temp FROM 'C:\Temp\file.csv' 
WITH (FIELDTERMINATOR = ',', ROWTERMINATOR = '\n', CODEPAGE = 'ACP');

If you are bringing in UTF-8 data on a new enough version of SQL Server:

[...] , CODEPAGE = '65001');

You may also need to specify DATAFILETYPE = 'char|native|widechar|widenative'.

SQL Server default character encoding

If you need to know the default collation for a newly created database use:

SELECT SERVERPROPERTY('Collation')

This is the server collation for the SQL Server instance that you are running.

MySQL table: Unicode character aren't correctly displayed in phpmyadmin

Pick either cp1251 or utf8 (or utf8mb4), but don't mix them.

АБВ, encoded as utf8 is hex D090 D091 D092. Note: 2 bytes per Cyrillic character.

If you treate hex D0 90 D0 91 D0 92 as cp1251, you get АБВ. Note that is 6 bytes for 6 characters. Note the repetition of Р(Capital ER); this comes from D0.

For what causes it, see "Mojibake" in Trouble with UTF-8 characters; what I see is not what I stored - but replace "latin1" with "cp1251" as needed.

(After further info...)

Use this once, right after connecting:

mysqli_query($con, "SET NAMES utf8");

(Note to other readers: If you need Chinese or Emoji, everything should utf8mb4, not utf8.)



Related Topics



Leave a reply



Submit