How to open unicode file with ifstream using mingw under Windows?
Please note this is not the same questions as How to open an std::fstream (ofstream or ifstream) with a unicode filename?. That question was about an unicode filename , this one is about an unicode file contents .
I need to open a UTF-8 unicode file (containing Spanish characters) with an ifstream. Under Linux this is no problem, but under Windows it is.
bool OpenSpanishFile(string filename)
{
ifstream spanishFile;
#ifdef WINDOWS
spanishFile.open(filename.c_str(),ios::binary);
#endif
if (!spanishFile.is_open()) return false;
spanishFile.clear();
spanishFile.seekg(ios::beg);
while (spanishFile.tellg()!=-1)
{
string line="";
getline(spanishFile,line);
//do stuff
cout << line << endl;
}
return true;
}
I compile it under Linux with:
i586-mingw32msvc-g++ -s -fno-rtti test.cpp test.exe
And then run it in wineconsole test.exe
.
The output contains all kinds of weird characters, so it tries to open the unicode file as something different.
I have searched the internet a lot about how to open a unicode file this way, but I couldn't get it to work.
Does anyone know a solution that does work with mingw? Thank you so much in advance.
Most likely (it's unclear whether the presented code is the real code) the reason that you see garbage is that std::cout
in Windows defaults to presenting its result in a non-UTF-8 console window.
To properly check whether you're reading the UTF-8 file correctly, simply collect all the input in a string, convert it from UTF-8 to UTF-16 wstring
, and display that using MessageBoxW
(or wide direct console output).
The following UTF-8 → UTF-16 conversion function works nicely with Visual C++ 12.0:
#include <codecvt> // std::codecvt_utf8_utf16
#include <locale> // std::wstring_convert
#include <string> // std::wstring
auto wstring_from_utf8( char const* const utf8_string )
-> std::wstring
{
std::wstring_convert< std::codecvt_utf8_utf16< wchar_t > > converter;
return converter.from_bytes( utf8_string );
}
Unfortunately, even though it only uses standard C++11 functionality, it fails to compile with MinGW g++ 4.8.2, but hopefully you have Visual C++ (after all it's free).
As an alternative you can code up a conversion function using the Windows API MultiByteToWideChar
.
For example, the following code works nicely with g++ 4.8.2 with -D USE_WINAPI
:
#undef UNICODE
#define UNICODE
#include <windows.h>
#include <shellapi.h> // ShellAbout
#ifndef USE_WINAPI
# include <codecvt> // std::codecvt_utf8_utf16
# include <locale> // std::wstring_convert
#endif
#include <fstream> // std::ifstream
#include <iostream> // std::cerr, std::endl
#include <stdexcept> // std::runtime_error, std::exception
#include <stdlib.h> // EXIT_FAILURE
#include <string> // std::string, std::wstring
namespace my {
using std::ifstream;
using std::ios;
using std::runtime_error;
using std::string;
using std::wstring;
#ifndef USE_WINAPI
using std::codecvt_utf8_utf16;
using std::wstring_convert;
#endif
auto hopefully( bool const c ) -> bool { return c; }
auto fail( string const& s ) -> bool { throw runtime_error( s ); }
#ifdef USE_WINAPI
auto wstring_from_utf8( char const* const utf8_string )
-> wstring
{
if( *utf8_string == ' ' )
{
return L"";
}
wstring result( strlen( utf8_string ), L'#' ); // More than enough.
int const n_chars = MultiByteToWideChar(
CP_UTF8,
0, // Flags, only alternative is MB_ERR_INVALID_CHARS
utf8_string,
-1, // ==> The string is null-terminated.
&result[0],
result.size()
);
hopefully( n_chars > 0 )
|| fail( "MultiByteToWideChar" );
result.resize( n_chars );
return result;
}
#else
auto wstring_from_utf8( char const* const utf8_string )
-> wstring
{
wstring_convert< codecvt_utf8_utf16< wchar_t > > converter;
return converter.from_bytes( utf8_string );
}
#endif
auto text_of_file( string const& filename )
-> string
{
ifstream f( filename, ios::in | ios::binary );
hopefully( !f.fail() )
|| fail( "file open" );
string result;
string s;
while( getline( f, s ) )
{
result += s + 'n';
}
return result;
}
void cpp_main()
{
string const utf8_text = text_of_file( "spanish.txt" );
wstring const wide_text = wstring_from_utf8( utf8_text.c_str() );
//ShellAbout( 0, L"Spanish text", wide_text.c_str(), LoadIcon( 0, IDI_INFORMATION ) );
MessageBox(
0,
wide_text.c_str(),
L"Spanish text",
MB_ICONINFORMATION | MB_SETFOREGROUND
);
}
} // namespace my
auto main()
-> int
{
using namespace std;
try
{
my::cpp_main();
return EXIT_SUCCESS;
}
catch( exception const& x )
{
cerr << "!" << x.what() << endl;
}
return EXIT_FAILURE;
}
链接地址: http://www.djcxy.com/p/54644.html
上一篇: 使代码(更多)跨平台