How to Convert Wstring into String

How to convert wstring into string?

Here is a worked-out solution based on the other suggestions:

#include <string>
#include <iostream>
#include <clocale>
#include <locale>
#include <vector>

int main() {
std::setlocale(LC_ALL, "");
const std::wstring ws = L"ħëłlö";
const std::locale locale("");
typedef std::codecvt<wchar_t, char, std::mbstate_t> converter_type;
const converter_type& converter = std::use_facet<converter_type>(locale);
std::vector<char> to(ws.length() * converter.max_length());
std::mbstate_t state;
const wchar_t* from_next;
char* to_next;
const converter_type::result result = converter.out(state, ws.data(), ws.data() + ws.length(), from_next, &to[0], &to[0] + to.size(), to_next);
if (result == converter_type::ok or result == converter_type::noconv) {
const std::string s(&to[0], to_next);
std::cout <<"std::string = "<<s<<std::endl;
}
}

This will usually work for Linux, but will create problems on Windows.

C++ - std::wstring to std::string - quick and dirty conversion for use as key in std::map

If you are not interested in the semantic of the content, but just to the content to be comparable, I'll just coherce the inner wchar[] into a char[] of doubled size and use it to initialize the string (by specifying address/size in the constructor)

std::wstring ws(L"ABCD€FG");
std::string s((const char*)&ws[0], sizeof(wchar_t)/sizeof(char)*ws.size());

Now s is unprintable (it may contain null chars) but still assignable and comparable.

Yo can go back as:

std::wstring nws((const wchar_t*)&s[0], sizeof(char)/sizeof(wchar_t)*s.size());

Now compare

std::cout << (nws==ws)

should print 1.

However, note that this way the order in the map (result of operator<) is ... fuzzy because of the presence of the 0, and don't reflect any text sematics. However search still works, since -however fuzzy- it is still an "order".

Convert from std::wstring to std::string

std::string simply holds an array of bytes. It does not hold information about the encoding in which these bytes are supposed to be interpreted, nor do the standard library functions or std::string member functions generally assume anything about the encoding. They handle the contents as just an array of bytes.

Therefore when the contents of a std::string need to be presented, the presenter needs to make some guess about the intended encoding of the string, if that information is not provided in some other way.

I am assuming that the encoding you intend to convert to is UTF8, given that you are using std::codecvt_utf8.

But if you are using Virtual Studio, the debugger simply assumes one specific encoding, at least by default. That encoding is not UTF8, but I suppose probably code page 1252.

As verification, python gives the following:

>>> '日本'.encode('utf8').decode('cp1252')
'日本'

Your string does seem to be the UTF8 encoding of 日本 interpreted as if it was cp1252 encoded.

Therefore the conversion seems to have worked as intended.


As mentioned by @MarkTolonen in the comments, the encoding to assume for a string variable can be specified to UTF8 in the Visual Studio debugger with the s8 specifier, as explained in the documentation.



Related Topics



Leave a reply



Submit