C++ Convert String (Or Char*) to Wstring (Or Wchar_T*)

Convert const char* to wstring

I recommend you using std::string instead of C-style strings (char*) wherever possible. You can create std::string object from const char* by simple passing it to its constructor.

Once you have std::string, you can create simple function that will convert std::string containing multi-byte UTF-8 characters to std::wstring containing UTF-16 encoded points (16bit representation of special characters from std::string).

There are more ways how to do that, here's the way by using MultiByteToWideChar function:

std::wstring s2ws(const std::string& str)
{
int size_needed = MultiByteToWideChar(CP_UTF8, 0, &str[0], (int)str.size(), NULL, 0);
std::wstring wstrTo( size_needed, 0 );
MultiByteToWideChar(CP_UTF8, 0, &str[0], (int)str.size(), &wstrTo[0], size_needed);
return wstrTo;
}

Check these questions too:

Mapping multibyte characters to their unicode point representation

Why use MultiByteToWideCharArray to convert std::string to std::wstring?

How to convert string to wchar_t when string is stored in a variable? [duplicate]

Two different ways...depending on simplicity... This is in Visual C++ group...

So first I would try using CStringW. Depending on #defines your regular CString might be a CStringA or CStringW. But, you can say CStringW.

CStringW sWide = "abcdef"; // uses current thread code page
const wchar_t* pWide = sWide.GetString(); // pointer only valid for scope of sWide

Or you can use MultiByteToWideChar() API.

wchar_t wszBuf[512];
MultiByteToWideChar(CP_ACP, 0, "abcdef", 6, wszBuf, _countof(wszBuf)); // substitute "abcdef" and the 6 (length) for your usage...

c++ can't convert string to wstring

The solution from "hahakubile" worked for me:

std::wstring s2ws(const std::string& s) {
std::string curLocale = setlocale(LC_ALL, "");
const char* _Source = s.c_str();
size_t _Dsize = mbstowcs(NULL, _Source, 0) + 1;
wchar_t *_Dest = new wchar_t[_Dsize];
wmemset(_Dest, 0, _Dsize);
mbstowcs(_Dest,_Source,_Dsize);
std::wstring result = _Dest;
delete []_Dest;
setlocale(LC_ALL, curLocale.c_str());
return result;
}

But the return value is not 100% correct:

string s = "101446012MaßnStörfall   PAt  #Maßnahme Störfall                      00810000100121000102000020100000000000000";
wstring ws2 = s2ws(s);
cout << ws2.size() << endl; // returns 110 which is correct
wcout << ws2.substr(29,40) << endl; // returns #Ma�nahme St�rfall with symbols

I am wondering why it replaced german characters with symbols.

Thanks again!

How to convert std::string to wchar_t*

First off, you don't need the const_cast, as URLDownloadToFileW() takes a const wchar_t* as input, so passing it wide_string.c_str() will work as-is:

URLDownloadToFile(..., wide_string.c_str(), ...);

That being said, you are constructing a std::wstring with the individual char values of a std::string as-is. That will work without data loss only for ASCII characters <= 127, which have the same numeric values in both ASCII and Unicode. For non-ASCII characters, you need to actually convert the char data to Unicode, such as with MultiByteToWideChar() (or equivilent), eg:

std::wstring to_wstring(const std::string &s)
{
std::wstring wide_string;

// NOTE: be sure to specify the correct codepage that the
// str::string data is actually encoded in...
int len = MultiByteToWideChar(CP_ACP, 0, s.c_str(), s.size(), NULL, 0);
if (len > 0) {
wide_string.resize(len);
MultiByteToWideChar(CP_ACP, 0, s.c_str(), s.size(), &wide_string[0], len);
}

return wide_string;
}

URLDownloadToFileW(..., to_wstring(y).c_str(), ...);

That being said, there is a simpler solution. If the std::string is encoded in the user's default locale, you can simply call URLDownloadToFileA() instead, passing it the original std::string as-is, and let the OS handle the conversion for you, eg:

URLDownloadToFileA(..., y.c_str(), ...);

Cannot convert character array to wstring with utf-8 characters

I think you should change code to below:

std::wstring s2ws(const char* utf8Bytes)
{
const std::string& str(utf8Bytes);
int size_needed = MultiByteToWideChar(CP_ACP, 0, &str[0], (int)str.size(), NULL, 0);
std::wstring wstrTo(size_needed, 0);
MultiByteToWideChar(CP_ACP, 0, &str[0], (int)str.size(), &wstrTo[0], size_needed);
return wstrTo;
}

Difference between two flags is listed here.

UTF8 char array to std::wstring

Current solution

You can use std::wstring_convert to convert a string to or from wstring, using a codecvt to specify the conversion to be performed.

Example of use:

string so=u8"Jérôme Ângle"; 
wstring st;
wstring_convert<std::codecvt_utf8<wchar_t>,wchar_t> converter;
st = converter.from_bytes(so);

If you have a c-string (array of char), the overloads of from_bytes() will do exactly what you want:

char p[]=u8"Jérôme Ângle";
wstring ws = converter.from_bytes(p);

Online demo

Is it sustainable ?

As pointed out in the comments, C++17 has deprecated codecvt and the wstring_convert utility:

These features are hard to use correctly, and there
are doubts whether they are even specified correctly. Users should use
dedicated text-processing libraries instead.

In addition, a wstring is based on wchar_t which has a very different encoding on linux systems and on windows systems.

So the first question would be to ask why a wstring is needed at all, and why not just keep utf-8 everywhere.

Depending on the reasons, you may consider to use:

  • ICU and its UnicodeString for a full, in-depth, unicode support
  • boost.locale an its to_utf or utf_to_utf, for common unicode related tasks.
  • utf8-cpp for working with utf8 strings the unicode way (attention, seems not maintained).

How do I convert a char string to a wchar_t string?

Does this little function help?

#include <cstdlib>

int mbstowcs(wchar_t *out, const char *in, size_t size);

Also see the C++ reference



Related Topics



Leave a reply



Submit