How to avoid enormous additional memory consumption when using numpy vectorize?

This code below best illustrates my problem:

The output to the console (NB it takes ~8 minutes to run even the first test) shows the 512x512x512x16-bit array allocations consuming no more than expected (256MByte for each one), and looking at "top" the process generally remains sub-600MByte as expected.

However , while the vectorized version of the function is being called, the process expands to enormous size (over 7GByte!). Even the most obvious explanation I can think of to account for this - that vectorize is converting the inputs and outputs to float64 internally - could only account for a couple of gigabytes, even though the vectorized function returns an int16, and the returned array is certainly an int16. Is there some way to avoid this happening ? Am I using/understanding vectorize's otypes argument wrong ?

import numpy as np
import subprocess

def logmem():'cat /proc/meminfo | grep MemFree',shell=True)

def fn(x):
    return np.int16(x*x)

def test_plain(v):
    print "Explicit looping:"
    for z in xrange(v.shape[0]):
        for y in xrange(v.shape[1]):
            for x in xrange(v.shape[2]):
    print type(r[0,0,0])
    return r


def test_vectorize(v):
    print "Vectorize:"
    print type(r[0,0,0])
    return r


I'm using whichever versions of Python/numpy are current on an amd64 Debian Squeeze system (Python 2.6.6, numpy 1.4.1).

you can read the source code of vectorize(). It convert the array's dtype to object, and call np.frompyfunc() to create the ufunc from your python function, the ufunc returns object array, and finally vectorize() convert object array to int16 array.

It will use many memory when the dtype of array is object.

Using python function to do element wise calculation is slow, even is's converted to ufunc by frompyfunc().

It is a basic problem of vectorisation that all intermediate values are also vectors. While this is a convenient way to get a decent speed enhancement, it can be very inefficient with memory usage, and will be constantly thrashing your CPU cache. To overcome this problem, you need to use an approach which has explicit loops running at compiled speed, not at python speed. The best ways to do this are to use cython, fortran code wrapped with f2py or numexpr. You can find a comparison of these approaches here, although this focuses more on speed than memory usage.


上一篇: 目标文件不包含DWARF调试信息

下一篇: 如何在使用numpy向量化时避免巨大的额外内存消耗?