Python, Unicode, and the Windows console

2018-06-30 13:11:59

When I try to print a Unicode string in a Windows console, I get a UnicodeEncodeError: 'charmap' codec can't encode character .... error. I assume this is because the Windows console does not accept Unicode-only characters. What's the best way around this? Is there any way I can make Python automatically print a ? instead of failing in this situation?

Edit: I'm using Python 2.5.

Note: @LasseV.Karlsen answer with the checkmark is sort of outdated (from 2008). Please use the solutions/answers/suggestions below with care!!

@JFSebastian answer is more relevant as of today (6 Jan 2016).

Note: This answer is sort of outdated (from 2008). Please use the solution below with care!!

Here is a page that details the problem and a solution (search the page for the text Wrapping sys.stdout into an instance):

PrintFails - Python Wiki

Here's a code excerpt from that page:

$ python -c 'import sys, codecs, locale; print sys.stdout.encoding; 
    sys.stdout = codecs.getwriter(locale.getpreferredencoding())(sys.stdout); 
    line = u"u0411n"; print type(line), len(line); 
    sys.stdout.write(line); print line'
  UTF-8
  <type 'unicode'> 2
  Б
  Б

  $ python -c 'import sys, codecs, locale; print sys.stdout.encoding; 
    sys.stdout = codecs.getwriter(locale.getpreferredencoding())(sys.stdout); 
    line = u"u0411n"; print type(line), len(line); 
    sys.stdout.write(line); print line' | cat
  None
  <type 'unicode'> 2
  Б
  Б

There's some more information on that page, well worth a read.

Update: Python 3.6 implements PEP 528: Change Windows console encoding to UTF-8: the default console on Windows will now accept all Unicode characters. Internally, it uses the same Unicode API as the win-unicode-console package mentioned below. print(unicode_string) should just work now.

I get a UnicodeEncodeError: 'charmap' codec can't encode character... error.

The error means that Unicode characters that you are trying to print can't be represented using the current ( chcp ) console character encoding. The codepage is often 8-bit encoding such as cp437 that can represent only ~0x100 characters from ~1M Unicode characters:

>>> u"N{EURO SIGN}".encode('cp437')
Traceback (most recent call last):
...
UnicodeEncodeError: 'charmap' codec can't encode character 'u20ac' in position 0:
character maps to

I assume this is because the Windows console does not accept Unicode-only characters. What's the best way around this?

Windows console does accept Unicode characters and it can even display them (BMP only) if the corresponding font is configured . WriteConsoleW() API should be used as suggested in @Daira Hopwood's answer. It can be called transparently ie, you don't need to and should not modify your scripts if you use win-unicode-console package:

T:> py -mpip install win-unicode-console
T:> py -mrun your_script.py

See What's the deal with Python 3.4, Unicode, different languages and Windows?

Is there any way I can make Python automatically print a ? instead of failing in this situation?

If it is enough to replace all unencodable characters with ? in your case then you could set PYTHONIOENCODING envvar:

T:> set PYTHONIOENCODING=:replace
T:> python3 -c "print(u'[N{EURO SIGN}]')"
[?]

In Python 3.6+, the encoding specified by PYTHONIOENCODING envvar is ignored for interactive console buffers unless PYTHONLEGACYWINDOWSIOENCODING envvar is set to a non-empty string.

Despite the other plausible-sounding answers that suggest changing the code page to 65001, that does not work. (Also, changing the default encoding using sys.setdefaultencoding is not a good idea.)

See this question for details and code that does work.

链接地址: http://www.djcxy.com/p/85234.html

上一篇: 从IBAction获取按钮的文本

下一篇: Python，Unicode和Windows控制台