Python subprocess echo a unicode literal

I'm aware that questions like this have been asked before. But I'm not finding a solution.

I want to use a unicode literal, defined in my python file, with the subprocess module. But I'm not getting the results that I need. For example the following code

# -*- coding: utf-8 -*-
import sys
import codecs
import subprocess
cmd = ['echo', u'你好']
new_cmd = []
for c in cmd:
    if isinstance(c,unicode):
        c = c.encode('utf-8')
    new_cmd.append(c)
subprocess.call(new_cmd)

prints out

你好

If I change the code to

# -*- coding: utf-8 -*-
import sys
import codecs
import subprocess
cmd = ['echo', u'你好']
new_cmd = []
for c in cmd:
    if isinstance(c,unicode):
        c = c.encode(sys.getfilesystemencoding())
    new_cmd.append(c)
subprocess.call(new_cmd)

I get the following

??

At this stage I can only assume I'm, repeatedly, making a simple mistake. But I'm having a hard time figuring out what it is. How can I get echo to print out the following when invoked via python's subprocess

你好

Edit:

The version of Python is 2.7. I'm running on Windows 8 but I'd like the solution to be platform independent.


Conclusion: Pay attention to character encodings (there are three different character encodings here). Use Python 3 if you want portable Unicode support (pass arguments as Unicode, don't encode them) or make sure that the data can be represented using current character encodings from the environment (encode using sys.getfilesystemencoding() on Python 2 as you do in the 2nd code example).


The first code example is incorrect. The effect is the same as (run it in IDLE -- py -3 -midlelib ):

>>> print(u'你好'.encode('utf-8').decode('mbcs')) #XXX DON'T DO IT!
你好

where mbcs codec uses your Windows ANSI code page (typically: cp1252 character encoding -- it may be different eg, cp1251 on Russian Windows).

Python 2 uses CreateProcess macros to start a subprocess that is equivalent to CreateProcessA function there. CreateProcessA interprets input bytes as being encoded using your Windows ANSI encoding. It is unrelated to the Python source code encoding (utf-8 in your case).

It is expected that you get mojibake if you use a wrong encoding.


Your second code example should work if input characters can be represented using Windows code page such as cp1252 (to enable encoding from Unicode to bytes) and if echo uses Unicode API to print to Windows console such as WriteConsoleW() (see Python 3 package win-unicode-console -- it enables print(u'你好') whatever your chcp ("OEM") is as long as the font in console supports the characters) or the characters can be represented using OEM code page (used by cmd.exe ) such as cp437 (run chcp to find out yours). ?? question marks indicate that 你好 can't be represented using your console encoding.

To support arbitrary Unicode arguments (including characters that can't be represented using either Windows ("ANSI") or MS-DOS (OEM) code pages), you need CreateProcessW function (that is used by Python 3). See Unicode filenames on Windows with Python & subprocess.Popen().


Your first try was the best.

You actually converted the 2 unicode characters u'你好' (or u'u4f60u597d' ) in UTF8 all that giving b'xe4xbdxa0xe5xa5xbd' .

You can control it in IDLE that fully support unicode and where b'xe4xbdxa0xe5xa5xbd'.decode('utf-8') gives back 你好 . Another way to control it is to redirect script output to a file and open it with an UTF-8 compatible editor : there again you will see what you want.

But the problem is that Windows console does not support full unicode. It depends on :

  • the code page installed - I do not know for Windows 8 but previous versions had poor support for unicode and could display only 256 characters
  • the font used in the console - not all fonts have glyphs for all characters.
  • If you know a code page that contains glyphs for your characters (I don't), you can try to insert it in a console with chcp and explicitely encode your unicode string to that. But on my french machine, I do not know how to do ... except by passing by a text file !

    As you spoke of ConEmu, I did it a try ... and it works fine with it, with python 3.4 !

    chcp 65001
    py -3
    import subprocess
    cmd = ['cmd', '/c', 'echo', u'u4f60u597d']
    subprocess.call(cmd)
    

    gives :

    你好  
    0
    

    The problem is only in the cmd.exe windows !

    链接地址: http://www.djcxy.com/p/26026.html

    上一篇: 在测试中迭代所有Play框架路线

    下一篇: Python子进程回显一个unicode文字