Why are certain Unicode characters causing std::wcout to fail in a console app?
wcout
, or to be precise, a wfilebuf
instance it uses internally, converts wide characters to narrow characters, then writes those to the file (in your case, to stdout
). The conversion is performed by the codecvt
facet in the stream's locale; by default, that just does wctomb_s
, converting to the system default ANSI codepage, aka CP_ACP
.
Apparently, character '\xf021'
is not representable in the default codepage configured on your system. So the conversion fails, and failbit
is set in the stream. Once failbit
is set, all subsequent calls fail immediately.
I do not know of any way to get wcout
to successfully print arbitrary Unicode characters to console. wprintf
works though, with a little tweak:
#include <fcntl.h>
#include <io.h>
#include <string>
const std::wstring test = L"hello\xf021test!";
int _tmain(int argc, _TCHAR* argv[])
{
_setmode(_fileno(stdout), _O_U16TEXT);
wprintf(test.c_str());
return 0;
}
How to Output Unicode Strings on the Windows Console
The general strategy I/we use in most (cross platform) applications/projects is: We just use UTF-8 (I mean the real standard) everywhere. We use std::string as the container and we just interpret everything as UTF8. And we also handle all file IO this way, i.e. we expect UTF8 and save UTF8. In the case when we get a string from somewhere and we know that it is not UTF8, we will convert it to UTF8.
The most common case where we stumble upon WinUTF16 is for filenames. So for every filename handling, we will always convert the UTF8 string to WinUTF16. And also the other way if we search through a directory for files.
The console isn't really used in our Windows build (in the Windows build, all console output is wrapped into a file). As we have UTF8 everywhere, also our console output is UTF8 which is fine for most modern systems. And also the Windows console log file has its content in UTF8 and most text-editors on Windows can read that without problems.
If we would use the WinConsole more and if we would care a lot that all special chars are displayed correctly, we maybe would write some automatic pipe handler which we install in between fileno=0
and the real stdout
which will use WriteConsoleW
as you have suggested (if there is really no easier way).
If you wonder about how to realize such automatic pipe handler: We have implemented such thing already for all POSIX-like systems. The code probably doesn't work on Windows as it is but I think it should be possible to port it. Our current pipe handler is similar to what tee
does. I.e. if you do a cout << "Hello" << endl
, it will both be printed on stdout
and in some log-file. Look at the code if you are interested how this is done.
How do I print Unicode to the output console in C with Visual Studio?
This is code that works for me (VS2017) - project with Unicode enabled
#include <stdio.h>
#include <io.h>
#include <fcntl.h>
int main()
{
_setmode(_fileno(stdout), _O_U16TEXT);
wchar_t * test = L"the 来. Testing unicode -- English -- Ελληνικά -- Español." ;
wprintf(L"%s\n", test);
}
This is console
After copying it to the Notepad++ I see the proper string
the 来. Testing unicode -- English -- Ελληνικά -- Español.
OS - Windows 7 English, Console font - Lucida Console
Edits based on comments
I tried to fix the above code to work with VS2019 on Windows 10 and best I could come up with is this
#include <stdio.h>
int main()
{
const auto* test = L"the 来. Testing unicode -- English -- Ελληνικά -- Español.";
wprintf(L"%s\n", test);
}
When run it "as is" I see
When it is run with console set to Lucida Console fond and UTF-8 encoding I see
As the answer to 来 character shown as empty rectangle - I suppose is the limitation of the font which does not contain all the Unicode gliphs
When text is copied from the last console to Notepad++ all characters are shown correctly
Encoding of console output stream in windows
You cannot set the encoding to UTF-16 because both, cout
and wcout
, write to the same byte-oriented stream (STD_OUTPUT_HANDLE) and only byte-oriented encodings are supported. UTF-16 is word oriented. This implies that the only Unicode encoding that can be written to the standard output is UTF-8.
Related Topics
Program Can't Find Libgcc_S_Dw2-1.Dll
Portable Zip Library for C/C++ (Not an Application)
Get Current Username in C++ on Windows
C++ Abstract Class Without Pure Virtual Functions
How to Use Cin.Fail() in C++ Properly
Ostream Chaining, Output Order
How to Run Specific Test Cases in Googletest
Why Does This Call the Default Constructor
Consistent Pseudo-Random Numbers Across Platforms
How to Get the Application Data Path in Windows Using C++
How to Use Setenv() to Export a Variable in C++
Do These Members Have Unspecified Ordering
Somehow Register My Classes in a List
Array Decay to Pointers in Templates