gettext character encoding
I have the following gettext .po file, which has been translated from a .pot file. I am working on a Linux system (openSUSE if it matters), running gettext 0.17.
#
# <translate@transme.de>, 2011
# transer <translate@transme.de>, 2011
msgid ""
msgstr ""
"Project-Id-Version: transtestn"
"Report-Msgid-Bugs-To: n"
"POT-Creation-Date: 2011-05-24 22:47+0100n"
"PO-Revision-Date: 2011-05-30 23:03+0100n"
"Last-Translator: n"
"Language-Team: German (Germany)n"
"MIME-Version: 1.0n"
"Content-Type: text/plain; charset=UTF-8n"
"Content-Transfer-Encoding: 8bitn"
"Language: de_DEn"
"Plural-Forms: nplurals=2; plural=(n != 1)n"
#: transtest.cpp:12
msgid "Min Size"
msgstr "Min Größe"
Now, when I create the .mo file via
msgfmt -c transtest_de_DE.po -o transtest.mo
I then check the encoding with the "file" command,
file --mime transtest_de_DE.po
transtest_de_DE.po: text/x-po; charset=utf-8
and then install it to my locale folder and run the program after exporting LANG
and LC_CTYPE
, I end up with garbage where the two non-ASCII chars are.
If I set my terminal encoding to ISO-8859-2, rather than UTF-8, then I see the two characters correctly.
Looking inside the generated .mo file with a text editor the file appears to be in UTF-8 as well (I can see the symbols if I set my editor encoding to UTF-8).
The program is very simple, and it looks like so:
#include <iostream>
#include <locale>
const char *PROGRAM_NAME="transtest";
using namespace std;
int main()
{
setlocale (LC_ALL, "");
bindtextdomain( PROGRAM_NAME, "/usr/share/locale" );
textdomain( PROGRAM_NAME );
cerr << gettext("Min Size") << endl;
}
I am installing the .mo file to /usr/share/locale/de_DE/LC_MESSAGES/transstest.mo
, and I have exported LC_CTYPE
and LANG
as "de_DE".
$ echo $LC_CTYPE; echo $LANG
de_DE
de_DE
Where am I going wrong? Why is gettext giving me the wrong encoding (ISO-8859-2) for my strings, rather than the requested (in the .po file) UTF-8?
Edit:
The solution was in Stack Overflow question Can't make (UTF-8) traditional Chinese character to work in PHP gettext extension (.po and .mo files created in poEdit) and it appears that I needed to explicitly call
bind_textdomain_codeset(PROGRAM_NAME, "utf-8");
The final program looks like so:
#include <iostream>
#include <locale>
const char *PROGRAM_NAME="transtest";
using namespace std;
int main()
{
setlocale (LC_ALL, "");
bindtextdomain( PROGRAM_NAME, "/usr/share/locale" );
bind_textdomain_codeset(PROGRAM_NAME, "utf-8");
textdomain( PROGRAM_NAME );
cerr << gettext("Min Size") << endl;
}
No changes to any of my gettext files were needed.
If you have LC_CTYPE=de_DE
(or LANG
), programs are supposed to output ISO-8859-1 (note, 1, not 2), so if you have that and your terminal is set to utf-8, it's simply wrong. The correct locale for utf-8 is de_DE.utf-8
.
Using bind_textdomain_codeset
is wrong in your case. bind_textdomain_codeset
is used if you want to work in fixed encoding internally, like eg GNOME does, but output should always be in what the locale specifies (obtained by calling nl_langinfo(CODESET)
, which is also what gettext does by default).
下一篇: gettext字符编码