Read/Write file with unicode file name with plain C++/Boost
I want to read / write a file with a unicode file name using boost filesystem, boost locale on Windows (mingw) (should be platform independent at the end).
This is my code:
#include <boost/locale.hpp>
#define BOOST_NO_CXX11_SCOPED_ENUMS
#include <boost/filesystem.hpp>
#include <boost/filesystem/fstream.hpp>
namespace fs = boost::filesystem;
#include <string>
#include <iostream>
int main() {
std::locale::global(boost::locale::generator().generate(""));
fs::path::imbue(std::locale());
fs::path file("äöü.txt");
if (!fs::exists(file)) {
std::cout << "File does not exist" << std::endl;
}
fs::ofstream(file, std::ios_base::app) << "Test" << std::endl;
}
The fs::exists
really checks for a file with the name äöü.txt
. But the written file has the name äöü.txt
.
Reading gives the same problem. Using fs::wofstream
doesn't help either, since this just handles wide input.
How can I fix this using C++11 and boost?
Edit: Bug report posted: https://svn.boost.org/trac/boost/ticket/9968
To clarify for the bounty: It is quite simple with Qt, but I would like a cross platform solution using just C++11 and Boost, no Qt and no ICU.
This can be complicated, for two reasons:
There's a non-ASCII string in your C++ source file. How this literal gets converted to the binary representation of a const char *
would depend on compiler settings and/or OS codepage settings.
Windows only works with Unicode filenames through the UTF-16 encoding, while Unix uses UTF-8 for Unicode filenames.
Constructing the path object
To get this working on Windows, you can try to change your literal to wide characters (UTF-16):
const wchar_t *name = L"u00E4u00F6u00FC.txt";
fs::path file(name);
To get a full cross-platform solution, you'll have to start with either a UTF-8 or a UTF-16 string, then make sure it gets properly converted to the path::string_type
class.
Opening the file stream
Unfortunately, the C++ (and thus Boost) ofstream
API does not allow specifying wchar_t
strings as the filename. This is the case for both the constructor and the open
method.
You could try to make sure that the path object does not get immediately converted to const char *
(by using the C++11 string API) but this probably won't help:
std::ofstream(file.native()) << "Test" << std::endl;
For Windows to work, you might be able have to call the Unicode-aware Windows API, CreateFileW
, convert the HANDLE
to a FILE *
, then use the FILE *
for the ofstream
constructor. This is all described in another StackOverflow answer, but I'm not sure if that ofstream
constructor will exist on MinGW.
Unfortunately basic_ofstream
doesn't seem to allow subclassing for custom basic_filebuf
types, so the FILE *
conversion might be the only (completely non-portable) option.
An alternative: Memory-mapped files
Instead of using file streams, you can also write to files using memory-mapped I/O. Depending on how Boost implements this (it's not part of the C++ standard library), this method could work with Windows Unicode file names.
Here's a boost example (taken from another answer) that uses a path
object to open the file:
#include <boost/filesystem.hpp>
#include <boost/iostreams/device/mapped_file.hpp>
#include <iostream>
int main()
{
boost::filesystem::path p(L"b.cpp");
boost::iostreams::mapped_file file(p); // or mapped_file_source
std::cout << file.data() << std::endl;
}
I don't know how the answer here got accepted, since OP does fs::path::imbue(std::locale());
precisely not to give a damn about OS's codepage, std::wstring
and what not. Otherwise yeah, he'd just use plain old iconv, Winapi calls or other things suggested in the accepted answer. But that is not the point of using boost::locale here.
The real answer why this doesn't work, even though OP does imbue()
current locale like instructed in the Boost's documentation (see "Default Encoding under Microsoft Windows"), is because of boost (or mingw) bugs that go unresolved for at least a couple of years as of March 2015.
Unfortunately, mingw users seem to be left out in the cold.
Now, what boost developers should do in order to cover for these bugs is a whole different matter. It might turn out they need to implement precisely what Dan has stated.
Have you considered the approach of using ASCII characters in your source code and using the Boost Messages Formatting capabilities of the Boost.Locale library to look up the desired string using a ASCII key? http://www.boost.org/doc/libs/1_55_0/libs/locale/doc/html/messages_formatting.html
Alternatively you can use the Boost.Locale library to generate a UTF-8 library and then imbue Boost.Path with that locale using " boost::path::imbue()." http://boost.2283326.n4.nabble.com/boost-filesystem-path-as-utf-8-td4320098.html
This may also be of use to you.
Default Encoding under Microsoft Windows http://www.boost.org/doc/libs/1_51_0/libs/locale/doc/html/default_encoding_under_windows.html
链接地址: http://www.djcxy.com/p/54642.html