Is codecvt not a std header?
The reason why GCC rejects this code is simple: libstdc++ doesn't support <codecvt>
yet.
The C++11 support status page confirms this:
22.5 Standard code conversion facets N
Codecvt doesn't work in gcc
Only file streams are required to use std::codecvt<...>
and there is no requirement that any of the standard stream objects is implemented in terms of file streams. There are reasons for the implementers of either choice. Dinkumware's implementation uses <stdio.h>
for most of its operations and it makes sense to use the same implementation under the hood in this case. libstdc++ avoids some overheads and directly accesses a buffer shared between the standard C and C++ streams and, thus, uses a different stream implementation.
When using file streams use of the std::codecvt<...>
facets should be consistent.
Deprecated header codecvt replacement
std::codecvt
template from <locale>
itself isn't deprecated. For UTF-8 to UTF-16, there is still std::codecvt<char16_t, char, std::mbstate_t>
specialization.
However, since std::wstring_convert
and std::wbuffer_convert
are deprecated along with the standard conversion facets, there isn't any easy way to convert strings using the facets.
So, as Bolas already answered: Implement it yourself (or you can use a third party library, as always) or keep using the deprecated API.
Visual Studio C++ 2015 std::codecvt with char16_t or char32_t
Old question, but for future reference: this is a known bug in Visual Studio 2015, as explained in the latest post (January 7th 2016) in this thread of MSDN Social.
The workaround for your example looks like this (I implemented your method as a free function for simplicity):
#include <codecvt>
#include <locale>
#include <string>
#include <iostream>
#if _MSC_VER >= 1900
std::string utf16_to_utf8(std::u16string utf16_string)
{
std::wstring_convert<std::codecvt_utf8_utf16<int16_t>, int16_t> convert;
auto p = reinterpret_cast<const int16_t *>(utf16_string.data());
return convert.to_bytes(p, p + utf16_string.size());
}
#else
std::string utf16_to_utf8(std::u16string utf16_string)
{
std::wstring_convert<std::codecvt_utf8_utf16<char16_t>, char16_t> convert;
return convert.to_bytes(utf16_string);
}
#endif
int main()
{
std::cout << utf16_to_utf8(u"Élémentaire, mon cher Watson!") << std::endl;
return 0;
}
Hopefully, the problem will be fixed in future releases, otherwise the #if
condition will need refining.
UPDATE: nope, not fixed in VS 2017. Therefore, I've updated the preprocessor conditional to >= 1900
(initially was == 1900
).
Why is std::codecvt only used by file I/O streams?
The std::codecvt
facet was originally intended to handle I/O conversions between disk and memory character representation. Quoted from paragraph 39.4.6
of Bjarne Stroustrup's The C++ Programming Language fourth edition:
Sometimes, the representation of characters stored in a file differs from the desired representation of those same characters in main memory. ... the codecvt facet provides a mechanism for converting characters from one representation to another as they are read or written.
The intended purpose was thus to use std::codecvt
only for adapting characters between file (disk) and memory, which partly answers your question:
Why is std::codecvt only used by file I/O streams?
From the docs we see that:
All file I/O operations performed through
std::basic_fstream<CharT>
use thestd::codecvt<CharT, char, std::mbstate_t>
facet of the locale imbued in the stream.
Which then answers the question why std::ofstream
(uses a file-based streambuffer) and std::cout
(linked to standard output FILE stream) invokes std::codecvt
.
Now, to use the high-level std::ostream
interface you need to provide an underlying streambuf
. The std::ofstream
provides a filebuf
and the std::ostringstream
provides a stringbuf
(which is not linked to the use of std::codecvt
). See this post over the streams, which also highlights the following:
...in the case of ofstream, there are also a few extra functions which forward to additional functions in the filebuf interface
But, to invoke the character conversion functionality of a std::codecvt
when you have a std::ostringstream
which is a std::ostream
with an underlying std::basic_streambuf
you can use, as indicated in your post, the std::wbuffer_convert
.
You have only used the std::wstring_convert
in your second update and not the std::wbuffer_convert
.
When using the std::wbuffer_convert
you can wrap the original std::ostringstream
with a std::ostream
as follows:
// Create a std::ostringstream
auto osstream = std::ostringstream{};
// Create the wrapper for the ostringstream
std::wbuffer_convert<custom_facet, char> wrapper(osstream.rdbuf());
// Now create a std::ostream which uses the wrapper to send data to
// the original std::ostringstream
std::ostream normal_ostream(&wrapper);
normal_ostream << "test\n";
// Flush the stream to invoke the conversion
normal_ostream << std::flush;
// Check the invocation_counter
std::cout << "invocation_counter after wrapping std::ostringstream with "
"std::wbuffer_convert = "
<< invocation_counter << "\n";
Together with the complete example here, the output would be:
invocation_counter start of test1 = 0
invocation_counter after std::ofstream = 1
> test printed to std::cout
invocation_counter after std::cout = 2
invocation_counter after std::ostringstream (should not have changed)= 2
ic after test1 = 2
invocation_counter after std::ostringstream with std::wstring_convert = 3
ic after test2 = 3
invocation_counter after wrapping std::ostringstream with std::wbuffer_convert = 4
ic after test3 = 4
Conclusion
std::codecvt
was intended for converting between disk and memory representation. That is why the std::codecvt
implementation is only called with streams using an underlying filebuf
such as std::ofstream
and std::cout
.
However, a stream using an underlying stringbuf
can be wrapped using std::wbuffer_convert
into a std::ostream
instance which would then invoke the underlying std::codecvt
.
Where to put std::wstring_convertstd::codecvt_utf8wchar_t?
I wouldn't store the std::wstring_convert
in a global variable because that's not thread-safe and doesn't buy you much. There might be a performance hit with instantiating std::wstring_convert
everytime you need it, but that should not be your primary concern at the beginning (premature optimization).
So I'd just wrap that thing into functions:
std::wstring utf8_to_wstr( const std::string& utf8 ) {
std::wstring_convert<std::codecvt_utf8_utf16<wchar_t>> wcu8;
return wcu8.from_bytes( utf8 );
}
std::string wstr_to_utf8( const std::wstring& utf16 ) {
std::wstring_convert<std::codecvt_utf8_utf16<wchar_t>> wcu8;
return wcu8.to_bytes( utf16 );
}
You have to catch std::range_error
exception somewhere. It can be thrown by std::wstring_convert
if the conversion fails for some reason (invalid code points, etc.).
If you hit performance bottlenecks regarding string conversions later, you can still instantiate std::wstring_convert
directly at critical points in your code, e. g. outside of a long running loop that converts many strings.
Related Topics
In Cmake, How to Work Around the Debug and Release Directories Visual Studio 2010 Tries to Add
Using 'Const' in Class's Functions
Name of Process for Active Window in Windows 8/10
Opencv to Use in Memory Buffers or File Pointers
Reverse String C++ Using Char Array
Converting a Row of Cv::Mat to Std::Vector
Dependent Scope and Nested Templates
Lambda Expression VS Functor in C++
Which Greedy Initializer-List Examples Are Lurking in the Standard Library
Making My Own Photo-Mosaic App with Qt Using C++
Std::Ofstream, Check If File Exists Before Writing
How to Do a #Define Inside of Another #Define
Function Pointers in Objective C
Why Does the Size of a Class Depends on the Order of the Member Declaration? and How
Calculating Factorial Using Template Meta-Programming
Initializing an Object to All Zeroes
Qt Qwebenginepage::Setwebchannel() Transport Object
Why Does Unique_Ptr Take Two Template Parameters When Shared_Ptr Only Takes One