Convert Const Char* to Wstring

Convert const char* to wstring

I recommend you using std::string instead of C-style strings (char*) wherever possible. You can create std::string object from const char* by simple passing it to its constructor.

Once you have std::string, you can create simple function that will convert std::string containing multi-byte UTF-8 characters to std::wstring containing UTF-16 encoded points (16bit representation of special characters from std::string).

There are more ways how to do that, here's the way by using MultiByteToWideChar function:

std::wstring s2ws(const std::string& str)
{
int size_needed = MultiByteToWideChar(CP_UTF8, 0, &str[0], (int)str.size(), NULL, 0);
std::wstring wstrTo( size_needed, 0 );
MultiByteToWideChar(CP_UTF8, 0, &str[0], (int)str.size(), &wstrTo[0], size_needed);
return wstrTo;
}

Check these questions too:

Mapping multibyte characters to their unicode point representation

Why use MultiByteToWideCharArray to convert std::string to std::wstring?

C++: convert char * to wstring

That a very good question! :-)

  1. As Maxim wrote: mbstowcs()

  2. wsprintf() with "%S" (Capital "S"). In wsprintf() "S" means multi-byte string (in sprintf() "S" means wide-char).

  3. You can use std::wstring_convert and choose the UTF-8 encoding. I THINK its "codecvt_utf8_utf16"

For windows:

  1. MultiByteToWideChar() in WINAPI

  2. If you set to the clipboard using SetClipboardData() the ASCII text using CF_TEXT, windows allows you to GetClipboardData() for CF_UNICODETEXT doing the conversion for you!

You can also do it hardcore manually (and work only in some of the cases) by adding "NULLs" between 2 ASCII characters.

That's all comes to mind right now :-)

Best way to fill wstring with const char*

Since you don't need to do any character conversions you could initialize both of the strings from with a vector of characters. Consider this example:

#include <string>
#include <vector>

int main()
{
char data[32];
std::vector<char> v(data, data + 32);
std::string str(v.begin(), v.end());
std::wstring wstr(v.begin(), v.end());
}

UTF8 char array to std::wstring

Current solution

You can use std::wstring_convert to convert a string to or from wstring, using a codecvt to specify the conversion to be performed.

Example of use:

string so=u8"Jérôme Ângle"; 
wstring st;
wstring_convert<std::codecvt_utf8<wchar_t>,wchar_t> converter;
st = converter.from_bytes(so);

If you have a c-string (array of char), the overloads of from_bytes() will do exactly what you want:

char p[]=u8"Jérôme Ângle";
wstring ws = converter.from_bytes(p);

Online demo

Is it sustainable ?

As pointed out in the comments, C++17 has deprecated codecvt and the wstring_convert utility:

These features are hard to use correctly, and there
are doubts whether they are even specified correctly. Users should use
dedicated text-processing libraries instead.

In addition, a wstring is based on wchar_t which has a very different encoding on linux systems and on windows systems.

So the first question would be to ask why a wstring is needed at all, and why not just keep utf-8 everywhere.

Depending on the reasons, you may consider to use:

  • ICU and its UnicodeString for a full, in-depth, unicode support
  • boost.locale an its to_utf or utf_to_utf, for common unicode related tasks.
  • utf8-cpp for working with utf8 strings the unicode way (attention, seems not maintained).

Why I can't construct a wstring from char*

You could use wchar_t directly and use corresponding wchar_t supported API to retrieve data directly to wchar_t and then construct wstring. _dupenv_s function has a wide counterpart - _wdupenv_s.

Your code then would look like this:

wchar_t* pathAppData = nullptr;
size_t sz = 0;
_wdupenv_s(&pathAppData, &sz, L"APPDATA");
std::wstring wPathAppData(pathAppData);
wPathAppData.append(L"\\MyApplication")

Also this could be an interesting read: std::wstring VS std::string

Cannot convert character array to wstring with utf-8 characters

I think you should change code to below:

std::wstring s2ws(const char* utf8Bytes)
{
const std::string& str(utf8Bytes);
int size_needed = MultiByteToWideChar(CP_ACP, 0, &str[0], (int)str.size(), NULL, 0);
std::wstring wstrTo(size_needed, 0);
MultiByteToWideChar(CP_ACP, 0, &str[0], (int)str.size(), &wstrTo[0], size_needed);
return wstrTo;
}

Difference between two flags is listed here.

Convert const char* to const wchar_t*

There are multiple questions on SO that address the problem on Windows. Sample posts:

  1. char* to const wchar_t * conversion
  2. conversion from unsigned char* to const wchar_t*

There is a platform agnostic method posted at http://ubuntuforums.org/showthread.php?t=1579640. The source from this site is (I hope I am not violating any copyright):

#include <locale>
#include <iostream>
#include <string>
#include <sstream>
using namespace std ;

wstring widen( const string& str )
{
wostringstream wstm ;
const ctype<wchar_t>& ctfacet = use_facet<ctype<wchar_t>>(wstm.getloc()) ;
for( size_t i=0 ; i<str.size() ; ++i )
wstm << ctfacet.widen( str[i] ) ;
return wstm.str() ;
}

string narrow( const wstring& str )
{
ostringstream stm ;

// Incorrect code from the link
// const ctype<char>& ctfacet = use_facet<ctype<char>>(stm.getloc());

// Correct code.
const ctype<wchar_t>& ctfacet = use_facet<ctype<wchar_t>>(stm.getloc());

for( size_t i=0 ; i<str.size() ; ++i )
stm << ctfacet.narrow( str[i], 0 ) ;
return stm.str() ;
}

int main()
{
{
const char* cstr = "abcdefghijkl" ;
const wchar_t* wcstr = widen(cstr).c_str() ;
wcout << wcstr << L'\n' ;
}
{
const wchar_t* wcstr = L"mnopqrstuvwx" ;
const char* cstr = narrow(wcstr).c_str() ;
cout << cstr << '\n' ;
}
}


Related Topics



Leave a reply



Submit