How to open an std::fstream (ofstream or ifstream) with a unicode filename?
The C++ standard library is not Unicode-aware. char
and wchar_t
are not required to be Unicode encodings.
On Windows, wchar_t
is UTF-16, but there's no direct support for UTF-8 filenames in the standard library (the char
datatype is not Unicode on Windows)
With MSVC (and thus the Microsoft STL), a constructor for filestreams is provided which takes a const wchar_t*
filename, allowing you to create the stream as:
wchar_t const name[] = L"filename.txt";
std::fstream file(name);
However, this overload is not specified by the C++11 standard (it only guarantees the presence of the char
based version). It is also not present on alternative STL implementations like GCC's libstdc++ for MinGW(-w64), as of version g++ 4.8.x.
Note that just like char
on Windows is not UTF8, on other OS'es wchar_t
may not be UTF16. So overall, this isn't likely to be portable. Opening a stream given a wchar_t
filename isn't defined according to the standard, and specifying the filename in char
s may be difficult because the encoding used by char varies between OS'es.
How to open unicode file with ifstream using mingw under Windows?
Most likely (it's unclear whether the presented code is the real code) the reason that you see garbage is that std::cout
in Windows defaults to presenting its result in a non-UTF-8 console window.
To properly check whether you're reading the UTF-8 file correctly, simply collect all the input in a string, convert it from UTF-8 to UTF-16 wstring
, and display that using MessageBoxW
(or wide direct console output).
The following UTF-8 → UTF-16 conversion function works nicely with Visual C++ 12.0:
#include <codecvt> // std::codecvt_utf8_utf16
#include <locale> // std::wstring_convert
#include <string> // std::wstring
auto wstring_from_utf8( char const* const utf8_string )
-> std::wstring
{
std::wstring_convert< std::codecvt_utf8_utf16< wchar_t > > converter;
return converter.from_bytes( utf8_string );
}
Unfortunately, even though it only uses standard C++11 functionality, it fails to compile with MinGW g++ 4.8.2, but hopefully you have Visual C++ (after all it's free).
As an alternative you can code up a conversion function using the Windows API MultiByteToWideChar
.
For example, the following code works nicely with g++ 4.8.2 with -D USE_WINAPI
:
#undef UNICODE
#define UNICODE
#include <windows.h>
#include <shellapi.h> // ShellAbout
#ifndef USE_WINAPI
# include <codecvt> // std::codecvt_utf8_utf16
# include <locale> // std::wstring_convert
#endif
#include <fstream> // std::ifstream
#include <iostream> // std::cerr, std::endl
#include <stdexcept> // std::runtime_error, std::exception
#include <stdlib.h> // EXIT_FAILURE
#include <string> // std::string, std::wstring
namespace my {
using std::ifstream;
using std::ios;
using std::runtime_error;
using std::string;
using std::wstring;
#ifndef USE_WINAPI
using std::codecvt_utf8_utf16;
using std::wstring_convert;
#endif
auto hopefully( bool const c ) -> bool { return c; }
auto fail( string const& s ) -> bool { throw runtime_error( s ); }
#ifdef USE_WINAPI
auto wstring_from_utf8( char const* const utf8_string )
-> wstring
{
if( *utf8_string == '\0' )
{
return L"";
}
wstring result( strlen( utf8_string ), L'#' ); // More than enough.
int const n_chars = MultiByteToWideChar(
CP_UTF8,
0, // Flags, only alternative is MB_ERR_INVALID_CHARS
utf8_string,
-1, // ==> The string is null-terminated.
&result[0],
result.size()
);
hopefully( n_chars > 0 )
|| fail( "MultiByteToWideChar" );
result.resize( n_chars );
return result;
}
#else
auto wstring_from_utf8( char const* const utf8_string )
-> wstring
{
wstring_convert< codecvt_utf8_utf16< wchar_t > > converter;
return converter.from_bytes( utf8_string );
}
#endif
auto text_of_file( string const& filename )
-> string
{
ifstream f( filename, ios::in | ios::binary );
hopefully( !f.fail() )
|| fail( "file open" );
string result;
string s;
while( getline( f, s ) )
{
result += s + '\n';
}
return result;
}
void cpp_main()
{
string const utf8_text = text_of_file( "spanish.txt" );
wstring const wide_text = wstring_from_utf8( utf8_text.c_str() );
//ShellAbout( 0, L"Spanish text", wide_text.c_str(), LoadIcon( 0, IDI_INFORMATION ) );
MessageBox(
0,
wide_text.c_str(),
L"Spanish text",
MB_ICONINFORMATION | MB_SETFOREGROUND
);
}
} // namespace my
auto main()
-> int
{
using namespace std;
try
{
my::cpp_main();
return EXIT_SUCCESS;
}
catch( exception const& x )
{
cerr << "!" << x.what() << endl;
}
return EXIT_FAILURE;
}
Opening a text file with fstream but filename characters are not in ASCII
From the question How to open an std::fstream with a unicode filename @jalf notes that the C++ standard library is not unicode aware, but there is a windows extension that accepts wchar_t arrays.
You will be able to open a file on a windows platform by creating or calling open on an fstream object with a wchar_t array as the argument.
fstream fileHandle(L"δ»Wüste.txt");
fileHandle.open(L"δ»Wüste.txt");
Both of the above will call the wchar_t* version of the appropriate functions, as the L prefix on a string indicates that it is to be treated as a unicode string.
Edit: Here is a complete example that should compile and run. I created a file on my computer called δ»Wüste.txt
with the contents This is a test.
I then compiled and ran the following code in the same directory.
#include <fstream>
#include <iostream>
#include <string>
int main(int, char**)
{
std::fstream fileHandle(L"δ»Wüste.txt", std::ios::in|std::ios::out);
std::string text;
std::getline(fileHandle, text);
std::cout << text << std::endl;
system("pause");
return 0;
}
The output is:
This is a test.
Press any key to continue...
Opening fstream with file with Unicode file name under Windows using non-MSVC compiler
Currently there is no easy solution.
You need to create your own stream buffer that uses _wfopen
under the hood. You can use for this for example boost::iostream
c++ UTF-16 ofstream file creation Windows
Thanks you all guys, but it seems that C++ streams are helpless in this case (at least I got such opinion).
So I used WinApi:
#ifndef WIN32 // for Linux
ofstream out(output);
out.close();
#else // for Windows
LPWSTR lp=(LPWSTR )output;
CreateFileW(lp,GENERIC_READ | GENERIC_WRITE, FILE_SHARE_READ |
FILE_SHARE_WRITE, NULL,CREATE_ALWAYS,FILE_ATTRIBUTE_NORMAL,NULL );
#endif
And I got an output file with a correct name:
Thanks again!
Related Topics
How to Use a C++ String in a Structure When Malloc()-Ing the Same Structure
When Should I Use C++ Private Inheritance
Accessing Arrays by Index[Array] in C and C++
Dealing With Accuracy Problems in Floating-Point Numbers
What's the Difference Between _Pretty_Function_, _Function_, _Func_
Is There a C Pre-Processor Which Eliminates #Ifdef Blocks Based on Values Defined/Undefined
Right Way to Split an Std::String into a Vector≪String≫
How to Get Rid of 'Deprecated Conversion from String Constant to 'Char*'' Warnings in Gcc
Remove Elements of a Vector Inside the Loop
Why Do We Need Extern "C"{ #Include ≪Foo.H≫ } in C++
Scope Resolution Operator Without a Scope
Is Main() Really Start of a C++ Program
Pre-2016 Valgrind: Memory Still Reachable With Trivial Program Using ≪Iostream≫
How to Retrieve All Keys (Or Values) from a Std::Map and Put Them into a Vector
How Do Promotion Rules Work When the Signedness on Either Side of a Binary Operator Differ
Calling Pthread_Cond_Signal Without Locking Mutex