How to Print Utf-8 from C++ Console Application on Windows

How to print UTF-8 strings to std::cout on Windows?

The problem is not std::cout but the windows console. Using C-stdio you will get the ü with fputs( "\xc3\xbc", stdout ); after setting the UTF-8 codepage (either using SetConsoleOutputCP or chcp) and setting a Unicode supporting font in cmd's settings (Consolas should support over 2000 characters and there are registry hacks to add more capable fonts to cmd).

If you output one byte after the other with putc('\xc3'); putc('\xbc'); you will get the double tofu as the console gets them interpreted separately as illegal characters. This is probably what the C++ streams do.

See UTF-8 output on Windows console for a lenghty discussion.

For my own project, I finally implemented a std::stringbuf doing the conversion to Windows-1252. I you really need full Unicode output, this will not really help you, however.

An alternative approach would be overwriting cout's streambuf, using fputs for the actual output:

#include <iostream>
#include <sstream>

#include <Windows.h>

class MBuf: public std::stringbuf {
public:
int sync() {
fputs( str().c_str(), stdout );
str( "" );
return 0;
}
};

int main() {
SetConsoleOutputCP( CP_UTF8 );
setvbuf( stdout, nullptr, _IONBF, 0 );
MBuf buf;
std::cout.rdbuf( &buf );
std::cout << u8"Greek: αβγδ\n" << std::flush;
}

I turned off output buffering here to prevent it to interfere with unfinished UTF-8 byte sequences.

How to print UTF-8 characters on console using C

The best answer given above was by Joachim Isaksson. Thank you, this ideed seems to be the problem. I solved it in Eclipse by setting the "Encoding" settings for the run configuration to UTF-8.

Sample Image
Sample Image

Show UTF-8 characters in console

There are some hacks you can find that demonstrate how to write multibyte character sets to the Console, but they are unreliable. They require your console font to be one that supports it, and in general, are something I would avoid. (All of these techniques break if your user doesn't do extra work on their part... so they are not reliable.)

If you need to write Unicode output, I highly recommend making a GUI application to handle this, instead of using the Console. It's fairly easy to make a simple GUI to just write your output to a control which supports Unicode.

UTF-8 output on Windows console

It's time to close this now. Stephan T. Lavavej says the behaviour is "by design", although I cannot follow this explanation.

My current knowledge is: Windows XP console in UTF-8 codepage does not work with C++ iostreams.

Windows XP is getting out of fashion now and so does VS 2008. I'd be interested to hear if the problem still exists on newer Windows systems.

On Windows 7 the effect is probably due to the way the C++ streams output characters. As seen in an answer to Properly print utf8 characters in windows console, UTF-8 output fails with C stdio when printing one byte after after another like putc('\xc3'); putc('\xbc'); as well. Perhaps this is what C++ streams do here.

How do I print Unicode to the output console in C with Visual Studio?

This is code that works for me (VS2017) - project with Unicode enabled

#include <stdio.h>
#include <io.h>
#include <fcntl.h>

int main()
{
_setmode(_fileno(stdout), _O_U16TEXT);
wchar_t * test = L"the 来. Testing unicode -- English -- Ελληνικά -- Español." ;

wprintf(L"%s\n", test);
}

This is console

output

After copying it to the Notepad++ I see the proper string

the 来. Testing unicode -- English -- Ελληνικά -- Español.

OS - Windows 7 English, Console font - Lucida Console

Edits based on comments

I tried to fix the above code to work with VS2019 on Windows 10 and best I could come up with is this

#include <stdio.h>
int main()
{
const auto* test = L"the 来. Testing unicode -- English -- Ελληνικά -- Español.";

wprintf(L"%s\n", test);
}

When run it "as is" I see
Default console settings

When it is run with console set to Lucida Console fond and UTF-8 encoding I see
Console switched to UTF-8

As the answer to 来 character shown as empty rectangle - I suppose is the limitation of the font which does not contain all the Unicode gliphs

When text is copied from the last console to Notepad++ all characters are shown correctly



Related Topics



Leave a reply



Submit