Difference Between N'String' VS U'String' Literals in Oracle

Difference between N'String' vs U'String' literals in Oracle

In this answer i will try to provide informations from official resources

(1) The N'' text Literal

N'' is used to convert a string to NCHAR or NVARCHAR2 datatype

According to this Oracle documentation Oracle - Literals

The syntax of text literals is as follows:

Sample Image

where N or n specifies the literal using the national character set (NCHAR or NVARCHAR2 data).

Also in this second article Oracle - Datatypes

The N'String' is used to convert a string to NCHAR datatype

From the article listed above:

The following example compares the translated_description column of the pm.product_descriptions table with a national character set string:

SELECT translated_description FROM product_descriptions
WHERE translated_name = N'LCD Monitor 11/PM';


(2) The U'' Literal

U'' is used to handle the SQL NCHAR String Literals in Oracle Call Interface (OCI)

Based on this Oracle documentation Programming with Unicode

The Oracle Call Interface (OCI) is the lowest level API that the rest of the client-side database access products use. It provides a flexible way for C/C++ programs to access Unicode data stored in SQL CHAR and NCHAR datatypes. Using OCI, you can programmatically specify the character set (UTF-8, UTF-16, and others) for the data to be inserted or retrieved. It accesses the database through Oracle Net.

OCI is the lowest-level API for accessing a database, so it offers the best possible performance.

Handling SQL NCHAR String Literals in OCI

You can switch it on by setting the environment variable ORA_NCHAR_LITERAL_REPLACE to TRUE. You can also achieve this behavior programmatically by using the OCI_NCHAR_LITERAL_REPLACE_ON and OCI_NCHAR_LITERAL_REPLACE_OFF modes in OCIEnvCreate() and OCIEnvNlsCreate(). So, for example, OCIEnvCreate(OCI_NCHAR_LITERAL_REPLACE_ON) turns on NCHAR literal replacement, while OCIEnvCreate(OCI_NCHAR_LITERAL_REPLACE_OFF) turns it off.

[...] Note that, when the NCHAR literal replacement is turned on, OCIStmtPrepare and OCIStmtPrepare2 will transform N' literals with U' literals in the SQL text and store the resulting SQL text in the statement handle. Thus, if the application uses OCI_ATTR_STATEMENT to retrieve the SQL text from the OCI statement handle, the SQL text will return U' instead of N' as specified in the original text.



(3) Answer for your question

From datatypes perspective, there is not difference between both queries provided

Unicode- VARCHAR and NVARCHAR

The column type nvarchar allows you to store Unicode characters, which basically means almost any character from almost any language (including modern languages and some obsolete languages), and a good number of symbols too.

How to convert a string with Unicode encoding to a string of letters

Technically doing:

String myString = "\u0048\u0065\u006C\u006C\u006F World";

automatically converts it to "Hello World", so I assume you are reading in the string from some file. In order to convert it to "Hello" you'll have to parse the text into the separate unicode digits, (take the \uXXXX and just get XXXX) then do Integer.ParseInt(XXXX, 16) to get a hex value and then case that to char to get the actual character.

Edit: Some code to accomplish this:

String str = myString.split(" ")[0];
str = str.replace("\\","");
String[] arr = str.split("u");
String text = "";
for(int i = 1; i < arr.length; i++){
int hexVal = Integer.parseInt(arr[i], 16);
text += (char)hexVal;
}
// Text will now have Hello

String literals and escape characters in postgresql

Partially. The text is inserted, but the warning is still generated.

I found a discussion that indicated the text needed to be preceded with 'E', as such:

insert into EscapeTest (text) values (E'This is the first part \n And this is the second');

This suppressed the warning, but the text was still not being returned correctly. When I added the additional slash as Michael suggested, it worked.

As such:

insert into EscapeTest (text) values (E'This is the first part \\n And this is the second');

How to convert string to unicode using PostgreSQL?

PostgreSQL databases have a native character type, the "server encoding". It is usually utf-8.

All text is in this encoding. Mixed encoding text is not supported, except if stored as bytea (i.e. as opaque byte sequences).

You can't store "unicode" or "non-unicode" strings, and PostgreSQL has no concept of "varchar" vs "nvarchar". With utf-8, characters that fall in the 7-bit ASCII range (and some others) are stored as a single byte, and wider chars require more storage, so it's just automatic. utf-8 requires more storage than ucs-2 or utf-16 for text that is all "wide" characters, but less for text that's a mixture.

PostgreSQL automatically converts to/from the client's text encoding, using the client_encoding setting. There is no need to convert explicitly.

If your client is "Unicode" (which Microsoft products tend to say when they mean UCS-2 or UTF-16), then most client drivers take care of any utf-8 <--> utf-16 conversion for you.

So you should not need to care, so long as your client does I/O with correct charset options and sets a correct client_encoding that matches the data its actually sends on the wire. (This is automatic with most client drivers like PgJDBC, nPgSQL, or the Unicode psqlODBC driver).

See:

  • character set support


Related Topics



Leave a reply



Submit