spawn performance when capturing process output

I am trying to use posix_spawn instead of fork/exec to get some performance gain. My current project is written in Python, so I used this Python binding. Also I tried some of its forks, and after that I wrote my own posix_spawn binding in Cython (to get rid of some dependencies), but obtained almost the same results.

There is indeed a significant speed-up when I just need to run processes without capturing stdout/stderr. But when I do need it (for my project it is necessary), the posix_spawn call becomes about as slow as fork/exec call. Moreover, it depends on the amount of allocated memory the same way as fork/exec does. It happens even if the process does not actually produce any output - I checked on /bin/true. I still can't explain such behaviour. For fork/exec (via subprocess module) there is no significant difference whether we read process output or not as long as the output is not too large.

Here is my test script (imports and the code for profiling are omitted)

 # test.py
 def spawn_no_out(args):
    command = args[0]
    pid = posix_spawn(command, args)
    status, rusage = exits(pid)

 def spawn(args):

    # Prepare pipes to capture stdout and stderr
    stdout_read, stdout_write = os.pipe()
    stderr_read, stderr_write = os.pipe()
    fa = FileActions()
    fa.add_dup2(stdout_write, 1)
    fa.add_close(stdout_read)
    fa.add_dup2(stderr_write, 2)
    fa.add_close(stderr_read)

    # Spawn the process
    command = args[0]
    pid = posix_spawn(command, args, file_actions=fa)

    # Read and close file descriptors
    os.close(stdout_write)
    os.close(stderr_write)
    status, rusage = exits(pid)
    out = os.fdopen(stdout_read)
    err = os.fdopen(stderr_read)
    return out, err

 def fork(args):
    return Popen(args, stdout=PIPE, stderr=PIPE).communicate()

 def fork_no_out(args):
    return subprocess.call(args)

 def run_benchmark(func, args, count):
    for _ in xrange(count):
       func(args)
    print "%s: %ds" % (func.__name__, time.time() - start)

 def main():
    # Reads from stdout the number of process spawns and size of allocated memory
    args = ["/bin/true"]
    count = int(sys.argv[1]) if len(sys.argv) > 1 else 1000
    mem_size = int(sys.argv[2]) if len(sys.argv) > 2 else 10000000
    some_allocated_memory = range(mem_size)
    for func in [spawn, spawn_no_out, fork, fork_no_out]:
        run_benchmark(func, args, count)


 if __name__ == "__main__":
    main()

Test output without allocating additional memory:

./test.py 10000 1
spawn: 34s
         3754834 function calls (3754776 primitive calls) in 33.517 seconds

   Ordered by: internal time, cumulative time
   List reduced from 144 to 5 due to restriction 

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    10000   15.475    0.002   15.475    0.002 {posix.wait4}
    10000    5.850    0.001    5.850    0.001 {_cffi__x2c5d2681xf492c09f.posix_spawn}
    10000    3.217    0.000   12.750    0.001 /usr/local/lib/python2.7/dist-packages/posix_spawn-0.1-py2.7.egg/posix_spawn/_impl.py:75(posix_spawn)
    10000    2.242    0.000   33.280    0.003 ./test.py:23(spawn)
   660000    1.777    0.000    3.159    0.000 /usr/local/lib/python2.7/dist-packages/cffi/api.py:212(new)



spawn_no_out: 14s
         3340013 function calls in 14.631 seconds

   Ordered by: internal time, cumulative time
   List reduced from 25 to 5 due to restriction 

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    10000    7.466    0.001    7.466    0.001 {posix.wait4}
    10000    2.012    0.000    2.012    0.000 {_cffi__x2c5d2681xf492c09f.posix_spawn}
    10000    1.658    0.000    6.994    0.001 /usr/local/lib/python2.7/dist-packages/posix_spawn-0.1-py2.7.egg/posix_spawn/_impl.py:75(posix_spawn)
   650000    1.640    0.000    2.919    0.000 /usr/local/lib/python2.7/dist-packages/cffi/api.py:212(new)
   650000    0.496    0.000    0.496    0.000 {_cffi_backend.newp}



fork: 40s
         840094 function calls in 40.745 seconds

   Ordered by: internal time, cumulative time
   List reduced from 53 to 5 due to restriction 

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    10000   19.460    0.002   19.460    0.002 {posix.read}
    10000    6.505    0.001    6.505    0.001 {posix.fork}
    10081    4.667    0.000    4.667    0.000 {built-in method poll}
    10000    2.773    0.000   30.190    0.003 /usr/lib/python2.7/subprocess.py:1187(_execute_child)
    10000    0.814    0.000   32.996    0.003 /usr/lib/python2.7/subprocess.py:650(__init__)



fork_no_out: 38s
         330013 function calls in 38.488 seconds

   Ordered by: internal time, cumulative time
   List reduced from 36 to 5 due to restriction 

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    10000   18.179    0.002   18.179    0.002 {posix.read}
    10000    6.904    0.001    6.904    0.001 {posix.waitpid}
    10000    6.613    0.001    6.613    0.001 {posix.fork}
    10000    2.633    0.000   28.976    0.003 /usr/lib/python2.7/subprocess.py:1187(_execute_child)
    10000    0.880    0.000   30.070    0.003 /usr/lib/python2.7/subprocess.py:650(__init__)

Test output with allocated memory for list of 10000000 integers (had to decrease the number of calls):

./test.py 1000 10000000
spawn: 20s
         379834 function calls (379776 primitive calls) in 20.022 seconds

   Ordered by: internal time, cumulative time
   List reduced from 144 to 5 due to restriction 

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     1000   10.022    0.010   10.022    0.010 {posix.wait4}
     1000    8.705    0.009    8.705    0.009 {_cffi__x2c5d2681xf492c09f.posix_spawn}
     1000    0.334    0.000    9.412    0.009 /usr/local/lib/python2.7/dist-packages/posix_spawn-0.1-py2.7.egg/posix_spawn/_impl.py:75(posix_spawn)
     1000    0.269    0.000   19.998    0.020 ./test.py:18(spawn)
    66000    0.174    0.000    0.318    0.000 /usr/local/lib/python2.7/dist-packages/cffi/api.py:212(new)



spawn_no_out: 1s
         334013 function calls in 1.480 seconds

   Ordered by: internal time, cumulative time
   List reduced from 25 to 5 due to restriction 

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     1000    0.755    0.001    0.755    0.001 {posix.wait4}
     1000    0.198    0.000    0.198    0.000 {_cffi__x2c5d2681xf492c09f.posix_spawn}
     1000    0.171    0.000    0.708    0.001 /usr/local/lib/python2.7/dist-packages/posix_spawn-0.1-py2.7.egg/posix_spawn/_impl.py:75(posix_spawn)
    65000    0.167    0.000    0.298    0.000 /usr/local/lib/python2.7/dist-packages/cffi/api.py:212(new)
    65000    0.050    0.000    0.050    0.000 {_cffi_backend.newp}



fork: 18s
         84067 function calls in 18.554 seconds

   Ordered by: internal time, cumulative time
   List reduced from 53 to 5 due to restriction 

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     1000    9.399    0.009    9.399    0.009 {posix.read}
     1000    7.815    0.008    7.815    0.008 {posix.fork}
     1054    0.414    0.000    0.414    0.000 {built-in method poll}
     1000    0.274    0.000   17.626    0.018 /usr/lib/python2.7/subprocess.py:1187(_execute_child)
     1000    0.078    0.000   17.871    0.018 /usr/lib/python2.7/subprocess.py:650(__init__)



fork_no_out: 18s
         33013 function calls in 18.732 seconds

   Ordered by: internal time, cumulative time
   List reduced from 36 to 5 due to restriction 

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     1000    9.467    0.009    9.467    0.009 {posix.read}
     1000    8.020    0.008    8.020    0.008 {posix.fork}
     1000    0.603    0.001    0.603    0.001 {posix.waitpid}
     1000    0.280    0.000   17.910    0.018 /usr/lib/python2.7/subprocess.py:1187(_execute_child)
     1000    0.072    0.000   18.000    0.018 /usr/lib/python2.7/subprocess.py:650(__init__)

Without profiling the results are the same. As we can see, there is a huge difference (1.4s vs 20s!) in performance for the cases when we call posix_spawn with and without capturing process output. There is no additional heavy calls - posix.wait4 just takes more time. What could I have done wrong here? Does someone have an idea why it happens and how to get better performance for posix_spawn?

PS Tested on Linux Mint 17 and CentOS 6.5 - same results.

UPDATE : The same performance degradation happens even if we pass empty FileActions object to posix_spawn, without actually reading stdout/stderr:

def spawn(args):
    command = args[0]
    pid = posix_spawn(command, args, file_actions=FileActions())
    status, rusage = exits(pid)

好吧,对于后代 - 看起来如果file_actions被设置并且某些标志没有被设置,posix_spawn只是使用fork:https://sourceware.org/git/?p = glibc.git; a = bllob; f = sysdeps /posix/spawni.c;h=2d3ae941dd19f0348ed95c0b957c68c3c0e9815d;hb=c758a6861537815c759cba2018a3b1abb1943842#l97


from the source code of the function, linked in alexgorin answer, it seems to me this flag rules over the other arguments :

POSIX_SPAWN_USEVFORK

try it and let me know

链接地址: http://www.djcxy.com/p/86472.html

上一篇: Django 1.9.4中的/ admin /导入错误

下一篇: 捕获过程输出时产生性能