memory leak calling cython function with large numpy array parameters?

I'm trying to write the python code that calls the following cython function test1 like this:

def test1( np.ndarray[np.int32_t, ndim=2] ndk, 
           np.ndarray[np.int32_t, ndim=2] nkw, 
           np.ndarray[np.float64_t, ndim=2] phi):

    for _ in xrange(int(1e5)):
        test2(ndk, nkw, phi)


cdef int test2(np.ndarray[np.int32_t, ndim=2] ndk,
               np.ndarray[np.int32_t, ndim=2] nkw,
               np.ndarray[np.float64_t, ndim=2] phi):
    return 1

my pure python code will call test1 and pass 3 numpy arrays as parameters, and they are very large (about 10^4*10^3). The test1 will in turn call the test2 which is defined with cdef keywords and pass those arrays. Since the test1 need to call test2 many times (about 10^5) before it returns, and test2 need not to be called outside the cython code, I use cdef instead of def .

But the problem is, every time the test1 calls test2, the memory starts to increase steadily. I've tried to call gc.collect() outside this cython code, but it doesn't work. And finally, the program will be killed by the system, for it has eaten up all the memories. I noticed that this problem only occurs with cdef and cpdef function, and if I change it into def it works fine.

I think the test1 is supposed to pass the references of these arrays to test2 in stead of object. But it seems as if it creates new objects of these arrays and pass them to test2, and these objects are never touched by the python gc afterwards.

did I miss something?


I'm still confused about this problem. But I found another way to bypass this problem. Just explicitly tell the cython to pass the pointer like this :

def test1( np.ndarray[np.int32_t, ndim=2] ndk, 
           np.ndarray[np.int32_t, ndim=2] nkw, 
           np.ndarray[np.float64_t, ndim=2] phi):

for _ in xrange(int(1e5)):
    test2(&ndk[0,0], &nkw[0,0], &phi[0,0])


cdef int test2(np.int32_t* ndk,
               np.int32_t* nkw,
               np.float64_t* phi):
    return 1

However, you will need to index the array like this: ndk[i*row_len + j] Details:https://github.com/cython/cython/wiki/tutorials-NumpyPointerToC

链接地址: http://www.djcxy.com/p/87928.html

上一篇: 在Cython中使用numpy数组掩码的性能

下一篇: 内存泄漏调用具有大型numpy数组参数的cython函数?