Why use MultiByteToWideChar and WideCharToMultiByte at the same time?
This code fragment first converts the string from the a multibyte representation using the system default code page to Unicode, then converts it to the UTF-8 multibyte representation. Thus, it converts text in the default code page to UTF-8 representation.
The code is fragile, in that it assumes the UTF-8 version will only double in size (this probably works most of the time, but the worse case is that a single byte in the default code page may map to 4 bytes in UTF-8).
Multiplatform way to convert between std::string and std::wstring
Basically using the <cstdlib>
you can get away with a similar implementation to what you already have, as mentioned by Joachim Pileborg. As long as you have set the locale to whatever you want it to be (for example: setlocale( LC_ALL, "en_US.utf8" );
MultiByteToWideChar(CP_UTF8, 0, &str[0], (int)str.size(), NULL, 0)
=> mbstowcs(nullptr, data(str), size(str))
MultiByteToWideChar(CP_UTF8, 0, &str[0], (int)str.size(), &wstrTo[0], size_needed)
=> mbstowcs(data(wstrTo), data(str), size(str))
WideCharToMultiByte(CP_UTF8, 0, &wstr[0], (int)wstr.size(), NULL, 0, NULL, NULL)
=> wcstombs(nullptr, data(wstr), size(wstr))
WideCharToMultiByte(CP_UTF8, 0, &wstr[0], (int)wstr.size(), &strTo[0], size_needed, NULL, NULL)
=> wcstombs(data(strTo), data(wstr), size(wstr))
EDIT:
c++11 requires strings to be allocated contiguously, which may be important if you are compiling cross-platform as previous standards did not require string
to be allocated contiguously. Previously calling &str[0]
, &strTo[0]
, &wstr[0]
, or &wstrTo[0]
could have caused problems.
Since c++17 is now the accepted standard, I've improved my suggested substitutions to use data
rather than dereferencing the front of the strings.
How do I convert a char* to a char* that is UTF-8 encoded?
Look into iconv(3). that's the api you want. You'll need -liconv
.
WideCharToMultiByte() vs. wcstombs()
In a nutshell: the WideCharToMultiByte
function exposes the encodings/code pages used for the conversion in the parameter list, while wcstombs
does not. This is a major PITA, as the standard does not define what encoding is to be used to produce the wchar_t
, while you as a developer certainly need to know what encoding you are converting to/from.
Apart from that, WideCharToMultiByte
is of course a Windows API function and is not available on any other platform.
Therefore I would suggest using WideCharToMultiByte
without a moment's thought if your application is not specifically written to be portable to non-Windows OSes. Otherwise, you might want to wrestle with wcstombs
or (preferably IMHO) look into using a full-feature portable Unicode library such as ICU.
WideCharToMultiByte in QB64
Some more args need to be passed with the BYVAL keyword:
FUNCTION MultiByteToWideChar& (BYVAL codePage~&, BYVAL dwFlags~&, lpszMbstring$, BYVAL byteCount&, lpwszWcstring$, BYVAL wideCount&)
FUNCTION WideCharToMultiByte& (BYVAL codePage~&, BYVAL dwFlags~&, lpWideString$, BYVAL ccWideChar%, lpMultiByte$, BYVAL multibyte%, BYVAL defaultchar&, BYVAL usedchar&)
Aside from that, the length of STRING * 260
is always 260, regardless of any value stored. This means Filename = Filename + CHR$(0)
won't work as intended, not that either of MultiByteToWideChar
or WideCharToMultiByte
require null-terminated input (that's why the byteCount
and ccWideChar
params exist; sometimes you only want to operate on a part of a string).
Worse, even if you use _MEMFILL
to set all bytes of Filename
to 0 to allow you to deal with things using ASCIIZ strings, INPUT
and LINE INPUT
will fill any remaining bytes not explicitly entered into Filename
with CHR$(32)
(i.e. a blank space as if you pressed the spacebar). For example, if you enter "Hello", there would be 5 bytes for the string entered and 255 bytes of character code 32 (or &H20
if you prefer hexadecimal).
To save yourself this terrible headache ("hello world.bas" is a valid filename!), you'll want to use STRING
, not STRING * 260
. If the length is greater than 260, you should probably print an error message. Whether you allow a user to enter a new filename or not after that is up to you.
You'll also want to use the return value of MultiByteToWideChar
since it is the number of characters in NewFilename
:
DIM Filename AS STRING
DIM NewFilename AS STRING * 260
DIM MultiByte AS STRING * 260
...
' Note: LEN(NewFilename) = 260 (**always**)
' This is why the number of wide chars written
' is saved.
NewFilenameLen = MultiByteToWideChar(0, 0, Filename, LEN(Filename), NewFilename, LEN(NewFilename))
...
' Note: LEN(MultiByte) = 260 (**always**)
x = WideCharToMultiByte(65001, 0, NewFilename, NewFilenameLen, MultiByte, LEN(MultiByte), 0, 0)
...
ICU C++ Converting Encodings
You can use ICU, but you may find iconv()
sufficient, which is a lot simpler to set up and operate (and it's part of Posix, and easily available for Windows).
With either library, you have to convert your unicode string to a wide string. In iconv()
that target is called WCHAR_T
. Once you have a wide char, you can use it directly in Windows.
In Linux, you can either proceed to use wcstombs()
to transform the wide character into the system's (and locale's) narrow character multibyte encoding (don't forget setlocale(LC_CTYPE, "");
), or, alternatively, if you are sure that you want UTF-8 rather than the system's encoding, you can transform from your original string to UTF-8 directly (also with either library).
Maybe you'll find this post of mine to provide some background.
Related Topics
Under What Circumstances Is It Advantageous to Give an Implementation of a Pure Virtual Function
What Does ## in a #Define Mean
How to Output the Actual Array in C++
Casting to Void* and Back to Original_Data_Type*
Copy Constructor of Derived Qt Class
What Is the Fastest Way to Compute Large Power of 2 Modulo a Number
How to Set the Background Image in Qt Stylesheet
Windows Named Pipe Support in Linux
Are Destructors Run When Calling Exit()
After Sending a Lot, My Send() Call Causes My Program to Stall Completely. How Is This Possible
C++ Back End Call the Python Level Defined Callbacks with Swig Wrapper
Cmake Error: "Add_Subdirectory Not Given a Binary Directory"
Bring Window to Front -> Raise(),Show(),Activatewindow() Don't Work
How to Do Password Authentication for a User Using Ldap
C++ Regular Expressions with Boost Regex