Why Does 'Std::Basic_Ifstream<Char16_T>' Not Work in C++11

Why does `std::basic_ifstreamchar16_t` not work in c++11?

The various stream classes need a set of definitions to be operational. The standard library requires the relevant definitions and objects only for char and wchar_t but not for char16_t or char32_t. Off the top of my head the following is needed to use std::basic_ifstream<cT> or std::basic_ofstream<cT>:

  1. std::char_traits<cT> to specify how the character type behaves. I think this template is specialized for char16_t and char32_t.
  2. The used std::locale needs to contain an instance of the std::num_put<cT> facet to format numeric types. This facet can just be instantiated and a new std::locale containing it can be created but the standard doesn't mandate that it is present in a std::locale object.
  3. The used std::locale needs to contain an instance of the facet std::num_get<cT> to read numeric types. Again, this facet can be instantiated but isn't required to be present by default.
  4. the facet std::numpunct<cT> needs to be specialized and put into the used std::locale to deal with decimal points, thousand separators, and textual boolean values. Even if it isn't really used it will be referenced from the numeric formatting and parsing functions. There is no ready specialization for char16_t or char32_t.
  5. The facet std::ctype<cT> needs to be specialized and put into the used facet to support widening, narrowing, and classification of the character type. There is no ready specialization for char16_t or char32_t.


    1. The facet std::codecvt<cT, char, std::mbstate_t> needs to be specialized and put into the used std::locale to convert between external byte sequences and internal "character" sequences. There is no ready specialization for char16_t or char32_t.

Most of the facets are reasonably easy to do: they just need to forward a simple conversion or do table look-ups. However, the std::codecvt facet tends to be rather tricky, especially because std::mbstate_t is an opaque type from the point of view of the standard C++ library.

All of that can be done. It is a while since I last did a proof of concept implementation for a character type. It took me about a day worth of work. Of course, I knew what I need to do when I embarked on the work having implemented the locales and IOStreams library before. To add a reasonable amount of tests rather than merely having a simple demo would probably take me a week or so (assuming I can actually concentrate on this work).

char16_t printing

Give this a try:

#include <locale>
#include <codecvt>
#include <string>
#include <iostream>

int main()
{
std::wstring_convert<std::codecvt_utf8_utf16<wchar_t> > myconv;
std::wstring ws(L"Your UTF-16 text");
std::string bs = myconv.to_bytes(ws);
std::cout << bs << '\n';
}

error: no matching function for call to ‘std::__cxx11::basic_stringchar::basic_string(int&)’

This statement

a = new T(MAX);

tries to create an object of the type std::string from the integer value MAX. However the class std::string has no such a constructor.

It seems you mean

a = new T[MAX];

that is you want to create an array of objects of the type std::string.

This function

T Stack<T>::peek() { 
if (top < 0) {
cout << "Stack is Empty" << endl;
return NULL;
} else {
return a[top];
}
}

is also wrong because creating an object of the type std::string from a null pointer results in undefined behavior. You should throw an exception for example std::out_of_range.

Pay attention to that the class has no destructor.

Instead of the dynamically allocated array you could use the class std::vector<std::string>.

Canot read char8_t from basic_stringstreamchar8_t

This is actually an old issue not specific to support for char8_t. The same issue occurs with char16_t or char32_t in C++11 and newer. The following gcc bug report has a similar test case.

  • https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88508

The issue is also discussed at the following:

  • GCC 4.8 and char16_t streams - bug?
  • Why does `std::basic_ifstream<char16_t>` not work in c++11?
  • http://gcc.1065356.n8.nabble.com/UTF-16-streams-td1117792.html

The issue is that gcc does not implicitly imbue the global locale with facets for ctype<char8_t>, ctype<char16_t>, or ctype<char32_t>. When attempting to perform an operation that requires one of these facets, a std::bad_cast exception is thrown from std::__check_facet (which is subsequently silently swallowed by the IOS sentry object created for the character extraction operator and which then sets badbit and failbit).

The C++ standard only requires that ctype<char> and ctype<wchar_t> be provided. See [locale.category]p2.

Using char16_t and char32_t in I/O

In the proposal Minimal Unicode support for the standard library (revision 2) it is indicated that there was only support among the Library Working Group for supporting the new character types in strings and codecvt facets. Apparently the majority was opposed to supporing iostream, fstream, facets other than codecvt, and regex.

According to minutes from the Portland meeting in 2006 "the LWG is committed to full support of Unicode, but does not intend to duplicate the library with Unicode character variants of existing library facilities." I haven't found any details, however I would guess that the committee feels that the current library interface is inappropriate for Unicode. One possible complaint could be that it was designed with fixed sized characters in mind, but Unicode completely obsoletes that as, while Unicode data can use fixed sized code points, it does not limit characters to single code points.

Personally I think there's no reason not to standardized the minimal support that's already provided on various platforms (Windows uses UTF-16 for wchar_t, most Unix platforms use UTF-32). More advanced Unicode support will require new library facilities, but supporting char16_t and char32_t in iostreams and facets won't get in the way but would enable basic Unicode i/o.

Compile error for (char based) STL (stream) containers in Visual Studio

Start notes:

  • I am using VStudio Community 2015 (v14.0.25431.01 Update 3). Version is important here, since standard header files might change across versions (and line numbers might differ)
  • Created [MSDN]: Compile error for STL (stream) containers in Visual Studio

Approaches:

  1. Quick (shallow) investigation

    On VStudio IDE double click, on the 2nd note in the Output window (after attempting to compile the file), and from there repeated RClicks on relevant macros, and from the context menu choosing Go To Definition (F12):

    • xlocnum (#120): (comment is part of the original file/line)

      __PURE_APPDOMAIN_GLOBAL _CRTIMP2_PURE static locale::id id; // unique facet id
    • yvals.h: (#494):

           #define _CRTIMP2_PURE _CRTIMP2
    • crtdefs.h (#29+):

      #ifndef _CRTIMP2
      #if defined CRTDLL2 && defined _CRTBLD
      #define _CRTIMP2 __declspec(dllexport)
      #else
      #if defined _DLL && !defined _STATIC_CPPLIB
      #define _CRTIMP2 __declspec(dllimport) // @TODO - cfati: line #34: Here is the definition
      #else
      #define _CRTIMP2
      #endif
      #endif
      #endif

    As seen, __declspec(dllimport) is defined on line #34. Repeating the process on the _DLL macro, yielded no result. Found on [MSDN]: Predefined Macros:

    _DLL Defined as 1 when the /MD or /MDd (Multithreaded DLL) compiler option is set. Otherwise, undefined.

    I thought of 2 possible ways to go on (both resulting in a successful build):

    • Use static version of CRT Runtime ([MSDN]: /MD, /MT, /LD (Use Run-Time Library)). I don't consider it a viable option, especially when the project consists of .dlls (and it does): bad things can happen (e.g. [SO]: Errors when linking to protobuf 3 on MSVC 2013, or even nastier ones can occur at runtime)
    • Manually #undef _DLL (in main.cpp, before any #include). This is a lame workaround (gainarie). It builds fine, but tampering with these things could (and most likely will) trigger Undefined Behavior at runtime

    None of these 2 options was fully satisfactory, so:

  2. Going a (little) bit deeper

    Tried to simplify things even more (main.cpp):

    #include <sstream>

    //typedef unsigned short CharType; // wchar_t unsigned short
    #define CharType unsigned short

    int main() {
    std::basic_stringstream<CharType> stream;
    CharType c = 0x41;
    stream << c;
    return 0;
    }

    Notes:

    • Replaced typedef by #define (to strip out new type definition complexity)
    • Switched to unsigned short which is wchar_t's definition (/Zc:wchar_t-) to avoid any possible type size / alignment differences


    "Compiled" the above code with [MSDN]: /E (Preprocess to stdout) and [MSDN]: /EP (Preprocess to stdout Without #line Directives) (so that the warnings/errors only reference line numbers from current file):

    • Generated preprocessed files (using each flag froma bove): ~1MB+ (~56.5k lines)
    • The only difference in the files was the #define (wchar_t vs. unsigned short) somewhere at the very end
    • Compiling the files (shockingly :)) yielded the same result: the wchar_t one compiled while the unsigned short failed with the same error
    • Added some #pragma message statements (yes, they are handled by the preprocessor, but still) in the file that fails (before each warning/note), noticed some difference between the 2 #defines, but so far unable to figure out why 1
    • While browsing the generated file(s), noticed a template<> struct char_traits<char32_t> definition, so I gave it a try, and it worked (at least the current program compiled) 1 (and, as expected sizeof(char32_t) is 4). Then, found [MSDN]: char, wchar_t, char16_t, char32_t


    Notes:

    • Although this fixed my current problem (still don't know why), will have to give it a shot on the end goal
    • 1 Although I looked over the file, I didn't see any template definitions targeting only the "privileged" types (e.g. I didn't see anything that would differentiate wchar_t, signed char or char32_t from unsigned short for example), so I don't know (yet) why it works for some types but not for others. This is an open topic, whenever I'll get new updates, I will share them

Bottom line:

As empirically discovered, the following types are allowed, when working with char based STL containers:

  • char
  • unsigned char
  • signed char
  • wchar_t
  • char16_t
  • char32_t
  • unsigned short (/Zc:wchar_t- only )

Final note(s):

  • I will incorporate anything useful (e.g. comments) in the answer

@EDIT0:

  • Based on @IgorTandetnik's answer on [MSDN]: Compile error for STL (stream) containers in Visual Studio, although there is still a little bit of fog left on:

    • unsigned char and signed char
    • Difference between static and dynamic C++ RTLib


    I'm going to accept this as an answer.

Implicit instantiation of undefined template 'std::basic_stringchar, std::char_traitschar, std::allocatorchar '

You need to include this header:

#include <string>

to upper with char16_t array

The best way to do it is probably something like this:

char16_t upper = std::use_facet<std::ctype<char16_t>>(std::locale()).toupper(ch);


Related Topics



Leave a reply



Submit