Output Unicode to Console Using C++, in Windows

How do I print Unicode to the output console in C with Visual Studio?

This is code that works for me (VS2017) - project with Unicode enabled

#include <stdio.h>
#include <io.h>
#include <fcntl.h>

int main()
{
_setmode(_fileno(stdout), _O_U16TEXT);
wchar_t * test = L"the 来. Testing unicode -- English -- Ελληνικά -- Español." ;

wprintf(L"%s\n", test);
}

This is console

output

After copying it to the Notepad++ I see the proper string

the 来. Testing unicode -- English -- Ελληνικά -- Español.

OS - Windows 7 English, Console font - Lucida Console

Edits based on comments

I tried to fix the above code to work with VS2019 on Windows 10 and best I could come up with is this

#include <stdio.h>
int main()
{
const auto* test = L"the 来. Testing unicode -- English -- Ελληνικά -- Español.";

wprintf(L"%s\n", test);
}

When run it "as is" I see
Default console settings

When it is run with console set to Lucida Console fond and UTF-8 encoding I see
Console switched to UTF-8

As the answer to 来 character shown as empty rectangle - I suppose is the limitation of the font which does not contain all the Unicode gliphs

When text is copied from the last console to Notepad++ all characters are shown correctly

What is the best way to output Unicode to console?

The clue is in the error message. "...cannot be represented in the current code page (1252)". So the code page needs to be changed. The code page identifier for UTF-8 is 65001. To change the code page, use SetConsoleOutputCP.

How to output unicode characters in C/C++

Most of those characters take more than a byte to encode, but std::cout's currently imbued locale will only output ASCII characters. For that reason you're probably seeing a lot of weird symbols or question marks in the output stream. You should imbue std::wcout with a locale that uses UTF-8 since these characters are not supported by ASCII:

// <locale> is required for this code.

std::locale::global(std::locale("en_US.utf8"));
std::wcout.imbue(std::locale());

std::wstring s = L"šđč枊ĐČĆŽ";
std::wcout << s;

For Windows systems you will need the following code:

#include <iostream>
#include <string>
#include <fcntl.h>
#include <io.h>

int main()
{
_setmode(_fileno(stdout), _O_WTEXT);

std::wstring s = L"šđč枊ĐČĆŽ";
std::wcout << s;

return 0;
}

C++ unicode characters in console using printf?

To use printf, and assuming you are using US-localized Windows with a console code page of 437 (run chcp to check), then the following corrected code will work if you save the source file in code page 437. One way to do this is to use Notepad++ and set Encoding->Character sets->Western European->OEM-US on the menu. The downside to this is your source code won't display nicely in most editors, unless they specifically support cp437, and even Notepad++ won't display it correctly on re-opening the file without setting the encoding again.

#include <stdio.h>
#include <stdlib.h>
#include <io.h>
#include <fcntl.h>

int main()
{
char pos[9] = {'X','O','X','O','X','O','X','O','X'};
printf(" %c ║ %c ║ %c \n", pos[0], pos[1], pos[2]);
printf("═══╬═══╬═══\n");
printf(" %c ║ %c ║ %c \n", pos[3], pos[4], pos[5]);
printf("═══╬═══╬═══\n");
printf(" %c ║ %c ║ %c \n", pos[6], pos[7], pos[8]);
system("pause"); system("pause");
}

On Windows, since the API is natively UTF-16, a better way is to use the following code and save the file in UTF-8 w/ BOM:

#include <stdio.h>
#include <stdlib.h>
#include <io.h>
#include <fcntl.h>

int main()
{
char pos[9] = {'X','O','X','O','X','O','X','O','X'};
_setmode(_fileno(stdout), _O_U16TEXT);
wprintf(L" %C ║ %C ║ %C \n", pos[0], pos[1], pos[2]);
wprintf(L"═══╬═══╬═══\n");
wprintf(L" %C ║ %C ║ %C \n", pos[3], pos[4], pos[5]);
wprintf(L"═══╬═══╬═══\n");
wprintf(L" %C ║ %C ║ %C \n", pos[6], pos[7], pos[8]);
system("pause");
}

Output (both cases):

 X ║ O ║ X
═══╬═══╬═══
O ║ X ║ O
═══╬═══╬═══
X ║ O ║ X
Press any key to continue . . .

How to Output Unicode Strings on the Windows Console

The general strategy I/we use in most (cross platform) applications/projects is: We just use UTF-8 (I mean the real standard) everywhere. We use std::string as the container and we just interpret everything as UTF8. And we also handle all file IO this way, i.e. we expect UTF8 and save UTF8. In the case when we get a string from somewhere and we know that it is not UTF8, we will convert it to UTF8.

The most common case where we stumble upon WinUTF16 is for filenames. So for every filename handling, we will always convert the UTF8 string to WinUTF16. And also the other way if we search through a directory for files.

The console isn't really used in our Windows build (in the Windows build, all console output is wrapped into a file). As we have UTF8 everywhere, also our console output is UTF8 which is fine for most modern systems. And also the Windows console log file has its content in UTF8 and most text-editors on Windows can read that without problems.

If we would use the WinConsole more and if we would care a lot that all special chars are displayed correctly, we maybe would write some automatic pipe handler which we install in between fileno=0 and the real stdout which will use WriteConsoleW as you have suggested (if there is really no easier way).

If you wonder about how to realize such automatic pipe handler: We have implemented such thing already for all POSIX-like systems. The code probably doesn't work on Windows as it is but I think it should be possible to port it. Our current pipe handler is similar to what tee does. I.e. if you do a cout << "Hello" << endl, it will both be printed on stdout and in some log-file. Look at the code if you are interested how this is done.

How to change console program for unicode support in windows?

After doing some research, it turns out the default console font does not support chainese glyphs. One can change the console font by using SetCurrentConsoleFontEx function.

Demo Code:

#ifdef _MSC_VER
#define _CRT_SECURE_NO_WARNINGS
#endif

#include <stdio.h>
#include <io.h>
#include <fcntl.h>
#include <windows.h>

#define FF_SIMHEI 54

int main(int argc, char const *argv[])
{
CONSOLE_FONT_INFOEX cfi = {0};

cfi.cbSize = sizeof(CONSOLE_FONT_INFOEX);
cfi.nFont = 0;
cfi.dwFontSize.X = 8;
cfi.dwFontSize.Y = 16;
cfi.FontFamily = FF_SIMHEI;
cfi.FontWeight = FW_NORMAL;
wcscpy(cfi.FaceName, L"SimHei");

SetCurrentConsoleFontEx(GetStdHandle(STD_OUTPUT_HANDLE), FALSE, &cfi);

/* UTF-8 String */
SetConsoleOutputCP(CP_UTF8); /* Thanks for Eryk Sun's notice: Remove this line if you are using windows 7 or 8 */
puts(u8"UTF-8你好");

/* UTF-16 String */
_setmode(_fileno(stdout), _O_U16TEXT);
_putws(L"UTF-16你好");

system("pause");

return 0;
}


Related Topics



Leave a reply



Submit