What Does the 'L' in Front a String Mean in C++

What does the 'L' in front a string mean in C++?

It's a wchar_t literal, for extended character set. Wikipedia has a little discussion on this topic, and c++ examples.

What exactly is the L prefix in C++?

The literal prefixes are a part of the core language, much like the suffixes:

'a'    // type: char
L'a' // type: wchar_t

"a" // type: char[2]
L"a" // type: wchar_t[2]
U"a" // type: char32_t[2]

1 // type: int
1U // type: unsigned int

0.5 // type: double
0.5f // type: float
0.5L // type: long double

Note that wchar_t has nothing to do with Unicode. Here is an extended rant of mine on the topic.

What does the prefix L... stand for in GCC C without #including wchar?

L'ÿ' is of type wchar_t, which can be implicitly converted into an unsigned short. L"ÿ" is of type wchar_t[2], which cannot be implicitly converted into unsigned short[2].

What a Lstring in front of square brackets mean?

String literals in C++ are actually arrays. I this case L" ABCDEFG=#" is a const wchar_t[11]. When you do

L" ABCDEFG=#"[pField[someOtherMathIndex]]

You are going to the pField[someOtherMathIndex]'th index of that wchar_t array.

Is a wide character string literal starting with L like LHello World guaranteed to be encoded in Unicode?

The L symbol in front of a string literal simply means that each character in the string will be stored as a wchar_t. But this doesn't necessarily imply Unicode. For example, you could use a wide character string to encode GB 18030, a character set used in China which is similar to Unicode. The C++03 standard doesn't have anything to say about Unicode, (however C++11 defines Unicode char types and string literals) so it's up to you to properly represent Unicode strings in C++03.

Regarding string literals, Chapter 2 (Lexical Conventions) of the C++ standard mentions a "basic source character set", which is basically equivalent to ASCII. So this essentially guarantees that "abc" will be represented as a 3-byte string (not counting the null), and L"abc" will be represented as a 3 * sizeof(wchar_t)-byte string of wide-characters.

The standard also mentions "universal-character-names" which allow you to refer to non-ASCII characters using the \uXXXX hexadecimal notation. These "universal-character-names" usually map directly to Unicode values, but the standard doesn't guarantee that they have to. However, you can at least guarantee that your string will be represented as a certain sequence of bytes by using universal-character-names. This will guarantee Unicode output provided the runtime environment supports Unicode, has the appropriate fonts installed, etc.

As for string literals in C++03 source files, again there is no guarantee. If you have a Unicode string literal in your code which contains characters outside of the ASCII range, it is up to your compiler to decide how to interpret these characters. If you want to explicitly guarantee that the compiler will "do the right thing", you'd need to use \uXXXX notation in your string literals.

What special things does the L do for a C strings, except of marking it as wide string?

Character constants prefixed with L have type wchar_t. This type is most likely 4 bytes on your system. So when you use an unsigned short * to index it, you only see half of the bytes for a given character.

Change the pointer type to unsigned int or uint32_t, or better yet just use wchar_t.



Related Topics



Leave a reply



Submit