Python subprocess losing 10% of a program's stdout
I have a program that needs to be called as a subprocess with python. The program has been written in java. yeah, i know...
anyway, I need to capture all of the output from said program.
Unfortunately, when I call subprocess.popen2 or subprocess.Popen with communicate[0], I'm losing around 10% of the output data when I'm using a subprocess.PIPE assigned to stdout AND when i'm using a file descriptor (the return from an open) assigned to stdout.
The documentation in subprocess is pretty explicit that using subprocess.PIPE is volatile if you're trying to capture all of the output from a child process.
I'm currently using pexpect to dump the ouput into a tmp file but that's taking forever for obvious reasons.
I'd like to keep all the data in memory to avoid disk writes.
any recommendations are welcome! thanks!
import subprocess
cmd = 'java -Xmx2048m -cp "/home/usr/javalibs/class:/home/usr/javalibs/libs/dependency.jar" --data data --input input"
# doesn't get all the data
#
p = subprocess.Popen(cmd, stdout=subprocess.PIPE, shell=True)
output = p.communicate()[0]
OR
# doesn't get all the data
#
fd = open("outputfile",'w')
p = subprocess.Popen(cmd, stdout=fd, shell=True)
p.communicate()
fd.close() # tried to use fd.flush() too.
# also tried
# p.wait() instead of p.communicate(), but wait doesn't really wait for the java program to finish running - it doesn't block
OR
# also fails to get all the data
#
import popen2
(rstdout, rstdin) = popen2.popen2(cmd)
Expected output is a series of ascii lines (a couple thousand). the lines contain a number and an end of line character
0n
1n
4n
0n
...
I had used subprocess
with much larger output on stdout
but haven't seen such problem. It's hard to conclude what's the root cause from what you've shown. I would check following:
Since p.wait()
didn't work for you. It could be the case that when you reading your PIPE
your java program still busy printing the last 10%. Get p.wait()
straight first:
PIPE
, does your 10% shows up? p.wait()
doesn't block on your java program. Does your java program further subprocessing other program? p.wait()
. Did your java program terminated normally? If the problem not lays in your concurrency model, then check if you are printing correctly in your java program:
stdout
? Does it prone to or ignoring IOException
? It must be something related to the process you are actually calling. You can verify this by doing a simple test with another python script that echos out lines:
out.py
import sys
for i in xrange(5000):
print "%dn" % i
sys.exit(0)
test.py
import subprocess
cmd = "python out.py"
p = subprocess.Popen(cmd, stdout=subprocess.PIPE, shell=True)
output = p.communicate()[0]
print output
So you can verify that its not the size of the data that is the issue, but rather the communication with the process you are calling.
You should also confirm the version of python you are running, as I have read about past issues concerning the internal buffer of Popen (but using a separate file handle as you have suggested normally fixed that for me).
It would be a buffer issue if the subprocess call was hanging indefinitely. But if the process is completing, just lacking lines, then Popen is doing its job.
链接地址: http://www.djcxy.com/p/77108.html