How to Properly Use Widechartomultibyte

Correct way to use WideCharToMultiByte when using unicode

Most likely your problem is that you're compiling with UNICODE defined. In this case PROCESSENTRY32 will actually be PROCESSENTRY32W.

But you're calling the ASCII-Version of Process32First instead of the unicode-version Process32FirstW.

Most of the winapi functions that accept both ascii & unicode arguments have 2 separate versions:

  • the ascii one, those usually end with A (or nothing)
  • the unicode on, those usually end with W
  • A macro that switches between the ascii and unicode versions depending on wether UNICODE is defined or not.

In your case that would be:

#ifdef UNICODE
#define Process32First Process32FirstW
#define Process32Next Process32NextW
#define PROCESSENTRY32 PROCESSENTRY32W
#define PPROCESSENTRY32 PPROCESSENTRY32W
#define LPPROCESSENTRY32 LPPROCESSENTRY32W
#endif // !UNICODE

Also keep in mind that Process32First will also populate your PROCESSENTRY32 (with the first found entry). So with your current implementation you'll always skip over the first process.


If you're building a windows app it's best to decide from the start if you want to use ascii or unicode.

(there's also the option to make both compile with TCHAR & friends)

Mixing them within a single app will lead to a lot of conversion problems (since not every unicode character can be represented in your ascii code page)

Also it'll make your life a lot easier if you just rely on the linker to import the functions instead of using GetProcAddress().

If you want to stick with unicode (the default for new projects), you could write your function like this:

#include <windows.h>
#include <tlhelp32.h>

DWORD find_pid(const wchar_t* procname) {
// Init some important local variables
HANDLE hProcSnap = CreateToolhelp32Snapshot(TH32CS_SNAPPROCESS, 0);
PROCESSENTRY32 pe32;
pe32.dwSize = sizeof(PROCESSENTRY32);

// Find the PID now by enumerating a snapshot of all the running processes
if (hProcSnap == INVALID_HANDLE_VALUE)
return 0;

if (!Process32First(hProcSnap, &pe32)) {
CloseHandle(hProcSnap);
return 0;
}

do {
if (lstrcmp(procname, pe32.szExeFile) == 0) {
CloseHandle(hProcSnap);
return pe32.th32ProcessID;
}
} while (Process32Next(hProcSnap, &pe32));

// not found
CloseHandle(hProcSnap);
return 0;
}

and call it like this:

std::wstring parentProcess = L"C:\\hello.exe";
DWORD pid = find_pid(parentProcess.c_str());

// or just:
DWORD pid = find_pid(L"C:\\hello.exe");

If you want your application to be able to compile for both unicode & ascii, you'll have to use TCHAR:

#include <windows.h>
#include <tlhelp32.h>
#include <tchar.h>

DWORD find_pid(const TCHAR* procname) {
// Init some important local variables
HANDLE hProcSnap = CreateToolhelp32Snapshot(TH32CS_SNAPPROCESS, 0);
PROCESSENTRY32 pe32;
pe32.dwSize = sizeof(PROCESSENTRY32);

// Find the PID now by enumerating a snapshot of all the running processes
if (hProcSnap == INVALID_HANDLE_VALUE)
return 0;

if (!Process32First(hProcSnap, &pe32)) {
CloseHandle(hProcSnap);
return 0;
}

do {
if (lstrcmp(procname, pe32.szExeFile) == 0) {
CloseHandle(hProcSnap);
return pe32.th32ProcessID;
}
} while (Process32Next(hProcSnap, &pe32));

// not found
CloseHandle(hProcSnap);
return 0;
}

and call it like this:

DWORD pid = find_pid(_T("C:\\hello.exe"));

How do I use MultiByteToWideChar?

You must call MultiByteToWideChar twice:

  1. The first call to MultiByteToWideChar is used to find the buffer size you need for the wide string. Look at Microsoft's documentation; it states:

    If the function succeeds and cchWideChar is 0, the return value is the required size, in characters, for the buffer indicated by lpWideCharStr.

    Thus, to make MultiByteToWideChar give you the required size, pass 0 as the value of the last parameter, cchWideChar. You should also pass NULL as the one before it, lpWideCharStr.

  2. Obtain a non-const buffer large enough to accommodate the wide string, using the buffer size from the previous step. Pass this buffer to another call to MultiByteToWideChar. And this time, the last argument should be the actual size of the buffer, not 0.

A sketchy example:

int wchars_num = MultiByteToWideChar( CP_UTF8 , 0 , x.c_str() , -1, NULL , 0 );
wchar_t* wstr = new wchar_t[wchars_num];
MultiByteToWideChar( CP_UTF8 , 0 , x.c_str() , -1, wstr , wchars_num );
// do whatever with wstr
delete[] wstr;

Also, note the use of -1 as the cbMultiByte argument. This will make the resulting string null-terminated, saving you from dealing with them.

Converting UTF-16 to UTF-8 using WideCharToMultiByte in C on Windows

Update 3: The hex output suggests that the source file has been misinterpreted somewhere along the compilation. Instead of using UTF-8, Windows Codepage 1252 has been used, which means the string has the wrong encoding in the compiled program. The stored byte sequence in the output file is therefore
C3 90 C2 Bf C3 91 E2 82 AC C3 90 C2 B8 90 C2 B2 C3 90 C2 B5 C3 91 E2 80 9A instead of the correct D0 BF D1 80 D0 B8 D0 B2 D0 B5 D1 82.

How to solve this problem depends on the toolchain. The MSVC has the /utf-8 flag to set the source and execution charset. You might think that this is quite redundant since you've already saved your source file as UTF-8? Turns out WordPad isn't the only software that requires a BOM to detect UTF-8. The following excerpt from the documentation explains the reason for the whole encoding problem.

By default, Visual Studio detects a byte-order mark to determine if
the source file is in an encoded Unicode format, for example, UTF-16
or UTF-8. If no byte-order mark is found, it assumes the source file
is encoded using the current user code page, unless you have specified
a code page by using /utf-8 or the /source-charset option.

In Visual Studio 17 you can also configure the charset by setting Character Set in Configuration Properties > General > Project Defaults. If you use cmake you will likely not encounter this problem because it configures everything properly out of the box.

Update 2:
Some editors may not be able to deduce that the content is UTF-8 from a short byte sequence like this, which will result in the garbled output you've seen. You could add the UTF-8 byte order mark (BOM) at the beginning of the file to help these editors, although it's not considered a best practice since it conflates metadata and content, breaks ASCII backward compatibility and UTF-8 can be properly detected without it. It's mostly legacy software like Microsoft's WordPad that needs the BOM to interpret the file as UTF-8.

if (WriteFile(file, "\xef\xbb\xbf", 3, NULL, NULL) == 0) { goto error; }

Update: Code with a bit of basic error handling:

#include <windows.h>
#include <fileapi.h>
#include <stringapiset.h>

int main() {
int ret_val = -1;

const wchar_t source[] = L"привет";

HANDLE file = CreateFileW(L"test.txt", GENERIC_ALL, 0, NULL, OPEN_ALWAYS, FILE_ATTRIBUTE_NORMAL, NULL);

if (file == INVALID_HANDLE_VALUE) { goto error_0; }

size_t required_size = WideCharToMultiByte(CP_UTF8, 0, source, -1, NULL, 0, NULL, NULL);

if (required_size == 0) { goto error_0; }

char *buffer = calloc(required_size, sizeof(char));

if (buffer == NULL) { goto error_0; }

if (WideCharToMultiByte(CP_UTF8, 0, source, -1, buffer, required_size, NULL, NULL) == 0) { goto error_1; }

if (WriteFile(file, buffer, required_size - 1, NULL, NULL) == 0) { goto error_1; }

if (CloseHandle(file) == 0) { goto error_1; }

ret_val = 0;

error_1:
free(buffer);

error_0:
return ret_val;
}

Old:
You can do the following which will create the file just fine. The first call to WideCharToMultiByte is used to determine the number of bytes required to store the UTF-8 string. Make sure to save the source file as UTF-8 otherwise the source string will not be properly encoded in the source file.

The following code is just a quick and dirty example and lacks rigorous error handling.

#include <windows.h>
#include <fileapi.h>
#include <stringapiset.h>

int main() {
HANDLE file = CreateFileW(L"test.txt", GENERIC_ALL, 0, NULL, OPEN_ALWAYS, FILE_ATTRIBUTE_NORMAL, NULL);
const wchar_t source[] = L"привет";

size_t required_size = WideCharToMultiByte(CP_UTF8, 0, source, -1, NULL, 0, NULL, NULL);

char *buffer = (char *) calloc(required_size, sizeof(char));

WideCharToMultiByte(CP_UTF8, 0, source, -1, buffer, required_size, NULL, NULL);
WriteFile(file, buffer, required_size - 1, NULL, NULL);
free(buffer);
return CloseHandle(file);
}

How to convert between widecharacter and multi byte character string in windows?

Your conversion functions are buggy.

The return value of MultiByteToWideChar() is a number of wide characters, not a number of bytes like you are currently treating it. You need to multiple the value by sizeof(WCHAR) when calling malloc().

You are also not taking into account that the return value DOES NOT include space for a null terminator, because you are not passing -1 in the cbMultiByte parameter. Read the MultiByteToWideChar() documentation:

cbMultiByte [in]

Size, in bytes, of the string indicated by the lpMultiByteStr parameter. Alternatively, this parameter can be set to -1 if the string is null-terminated. Note that, if cbMultiByte is 0, the function fails.

If this parameter is -1, the function processes the entire input string, including the terminating null character. Therefore, the resulting Unicode string has a terminating null character, and the length returned by the function includes this character.

If this parameter is set to a positive integer, the function processes exactly the specified number of bytes. If the provided size does not include a terminating null character, the resulting Unicode string is not null-terminated, and the returned length does not include this character.

...

Return value

Returns the number of characters written to the buffer indicated by lpWideCharStr if successful. If the function succeeds and cchWideChar is 0, the return value is the required size, in characters, for the buffer indicated by lpWideCharStr.

You are not null-terminating your output string.

The same goes with your convert_from_wstring() function. Read the WideCharToMultiByte() documentation:

cchWideChar [in]

Size, in characters, of the string indicated by lpWideCharStr. Alternatively, this parameter can be set to -1 if the string is null-terminated. If cchWideChar is set to 0, the function fails.

If this parameter is -1, the function processes the entire input string, including the terminating null character. Therefore, the resulting character string has a terminating null character, and the length returned by the function includes this character.

If this parameter is set to a positive integer, the function processes exactly the specified number of characters. If the provided size does not include a terminating null character, the resulting character string is not null-terminated, and the returned length does not include this character.

...

Return value

Returns the number of bytes written to the buffer pointed to by lpMultiByteStr if successful. If the function succeeds and cbMultiByte is 0, the return value is the required size, in bytes, for the buffer indicated by lpMultiByteStr.

That being said, your main() code is leaking the allocated strings. Since they are allocated with malloc(), you need to deallocate them with free() when you are done using them:

Also, you cannot pass a WCHAR* string to std::cout. Well, you can, but it has no operator<< for wide string input, but it does have an operator<< for void* input, so it will just end up outputting the memory address that the WCHAR* is pointing at, not the actual characters. If you want to output wide strings, use std::wcout instead.

Try something more like this:

WCHAR* convert_to_wstring(const char* str)
{
int str_len = (int) strlen(str);
int num_chars = MultiByteToWideChar(CP_UTF8, 0, str, str_len, NULL, 0);
WCHAR* wstrTo = (WCHAR*) malloc((num_chars + 1) * sizeof(WCHAR));
if (wstrTo)
{
MultiByteToWideChar(CP_UTF8, 0, str, str_len, wstrTo, num_chars);
wstrTo[num_chars] = L'\0';
}
return wstrTo;
}

CHAR* convert_from_wstring(const WCHAR* wstr)
{
int wstr_len = (int) wcslen(wstr);
int num_chars = WideCharToMultiByte(CP_UTF8, 0, wstr, wstr_len, NULL, 0, NULL, NULL);
CHAR* strTo = (CHAR*) malloc((num_chars + 1) * sizeof(CHAR));
if (strTo)
{
WideCharToMultiByte(CP_UTF8, 0, wstr, wstr_len, strTo, num_chars, NULL, NULL);
strTo[num_chars] = '\0';
}
return strTo;
}

int main()
{
const WCHAR* wText = L"Wide string";
const char* text = convert_from_wstring(wText);
std::cout << text << "\n";
free(text);

const WCHAR *wtext = convert_to_wstring("Multibyte string");
std::wcout << wtext << "\n";
free(wtext);

return 0;
}

That being said, you really should be using std::string and std::wstring instead of char* and wchar_t* for better memory management:

std::wstring convert_to_wstring(const std::string &str)
{
int num_chars = MultiByteToWideChar(CP_UTF8, 0, str.c_str(), str.length(), NULL, 0);
std::wstring wstrTo;
if (num_chars)
{
wstrTo.resize(num_chars);
MultiByteToWideChar(CP_UTF8, 0, str.c_str(), str.length(), &wstrTo[0], num_chars);
}
return wstrTo;
}

std::string convert_from_wstring(const std::wstring &wstr)
{
int num_chars = WideCharToMultiByte(CP_UTF8, 0, wstr.c_str(), wstr.length(), NULL, 0, NULL, NULL);
std::string strTo;
if (num_chars > 0)
{
strTo.resize(num_chars);
WideCharToMultiByte(CP_UTF8, 0, wstr.c_str(), wstr.length(), &strTo[0], num_chars, NULL, NULL);
}
return strTo;
}

int main()
{
const WCHAR* wText = L"Wide string";
const std::string text = convert_from_wstring(wText);
std::cout << text << "\n";

const std::wstring wtext = convert_to_wstring("Multibyte string");
std::wcout << wtext << "\n";

return 0;
}

If you are using C++11 or later, have a look at the std::wstring_convert class for converting between UTF strings, eg:

std::wstring convert_to_wstring(const std::string &str)
{
std::wstring_convert<std::codecvt_utf8_utf16<wchar_t>, wchar_t> conv;
return conv.from_bytes(str);
}

std::string convert_from_wstring(const std::wstring &wstr)
{
std::wstring_convert<std::codecvt_utf8_utf16<wchar_t>, wchar_t> conv;
return conv.to_bytes(wstr);
}

If you need to interact with other code that is based on char*/wchar_t*, std::string as a constructor for accepting char* input and a c_str() method that can be used for char* output, and the same goes for std::wstring and wchar_t*.

WideCharToMultiByte when is lpUsedDefaultChar true?

Anything that is not present in the current codepage will map to ? (by default) and UsedDefaultChar will be != FALSE.

Windows-1252 is probably the most common codepage and most of those characters map to the same value in unicode.

Take Ω (ohm) for example, it is probably not present in whatever your current codepage is and therefore will not map to a valid narrow character:

BOOL fUsedDefaultChar=FALSE;
DWORD dwSize;
char myOutStr[MAX_PATH];
WCHAR lpszW[10]=L"Hello";
*lpszW=0x2126; //ohm sign, you could also use the \u2126 syntax if your compiler supports it.
dwSize = WideCharToMultiByte(CP_ACP, 0, lpszW, -1, myOutStr ,MAX_PATH, NULL, &fUsedDefaultChar);
printf("%d %s\n",fUsedDefaultChar,myOutStr); //This prints "1 ?ello" on my system

How to properly use MultiByteToWideChar

From the MSDN documentation:

For UTF-8 or code page 54936 (GB18030, starting with Windows Vista), dwFlags must be set to either 0 or MB_ERR_INVALID_CHARS. Otherwise, the function fails with ERROR_INVALID_FLAGS.

You're using CP_UTF8 but also passing the MB_COMPOSITE flag, so that's why it's failing.

Using WideCharToMultiByte on Windows Mobile

Windows Mobile is based on Windows CE, and acording to the documentation, WideCharToMultiByte does not support the flag WC_NO_BEST_FIT_CHARS in Windows CE.

According to that page, supported flags are:


WC_COMPOSITECHECK Convert composite characters to precomposed characters.
WC_DISCARDNS Discard nonspacing characters during conversion.
WC_SEPCHARS Generate separate characters during conversion.
(This is the default conversion behavior).
WC_DEFAULTCHAR Replace exceptions with the default character during conversion.


Related Topics



Leave a reply



Submit