Python subprocess losing 10% of a program's stdout

I have a program that needs to be called as a subprocess with python. The program has been written in java. yeah, i know...

anyway, I need to capture all of the output from said program.

Unfortunately, when I call subprocess.popen2 or subprocess.Popen with communicate[0], I'm losing around 10% of the output data when I'm using a subprocess.PIPE assigned to stdout AND when i'm using a file descriptor (the return from an open) assigned to stdout.

The documentation in subprocess is pretty explicit that using subprocess.PIPE is volatile if you're trying to capture all of the output from a child process.

I'm currently using pexpect to dump the ouput into a tmp file but that's taking forever for obvious reasons.

I'd like to keep all the data in memory to avoid disk writes.

any recommendations are welcome! thanks!

import subprocess

cmd = 'java -Xmx2048m -cp "/home/usr/javalibs/class:/home/usr/javalibs/libs/dependency.jar" --data data --input input" 

# doesn't get all the data
#
p = subprocess.Popen(cmd, stdout=subprocess.PIPE, shell=True)
output = p.communicate()[0]

OR
# doesn't get all the data
#
fd = open("outputfile",'w')
p = subprocess.Popen(cmd, stdout=fd, shell=True)
p.communicate()
fd.close() # tried to use fd.flush() too.

# also tried
# p.wait() instead of p.communicate(), but wait doesn't really wait for the java program to finish running - it doesn't block

OR
# also fails to get all the data
#
import popen2
(rstdout, rstdin) = popen2.popen2(cmd)

Expected output is a series of ascii lines (a couple thousand). the lines contain a number and an end of line character

0n
1n
4n
0n
...

I had used subprocess with much larger output on stdout but haven't seen such problem. It's hard to conclude what's the root cause from what you've shown. I would check following:

Since p.wait() didn't work for you. It could be the case that when you reading your PIPE your java program still busy printing the last 10%. Get p.wait() straight first:

  • Insert a large enough wait (say 30 secs) before you read the PIPE , does your 10% shows up?
  • It's doubtful that p.wait() doesn't block on your java program. Does your java program further subprocessing other program?
  • check the return value of p.wait() . Did your java program terminated normally?
  • If the problem not lays in your concurrency model, then check if you are printing correctly in your java program:

  • What function you used in your java program to print to stdout ? Does it prone to or ignoring IOException ?
  • Did you flush the stream correctly? The last 10% could be in your buffer without proper flushing when your java program terminates.

  • It must be something related to the process you are actually calling. You can verify this by doing a simple test with another python script that echos out lines:

    out.py

    import sys
    
    for i in xrange(5000):
        print "%dn" % i
    
    sys.exit(0)
    

    test.py

    import subprocess
    
    cmd = "python out.py"
    p = subprocess.Popen(cmd, stdout=subprocess.PIPE, shell=True)
    output = p.communicate()[0]
    
    print output
    

    So you can verify that its not the size of the data that is the issue, but rather the communication with the process you are calling.

    You should also confirm the version of python you are running, as I have read about past issues concerning the internal buffer of Popen (but using a separate file handle as you have suggested normally fixed that for me).

    It would be a buffer issue if the subprocess call was hanging indefinitely. But if the process is completing, just lacking lines, then Popen is doing its job.

    链接地址: http://www.djcxy.com/p/77108.html

    上一篇: 订单是否对p.stdout.read()和p.wait()有影响?

    下一篇: Python子进程失去了程序标准输出的10%