cython numpy accumulate function

I need to implement a function for summing the elements of an array with a variable section length. So,

a = np.arange(10)
section_lengths = np.array([3, 2, 4])
out = accumulate(a, section_lengths)
print out
array([  3.,   7.,  35.])

I attempted an implementation in cython here:

https://gist.github.com/2784725

for performance I am comparing to the pure numpy solution for the case where the section_lengths are all the same:

LEN = 10000
b = np.ones(LEN, dtype=np.int) * 2000
a = np.arange(np.sum(b), dtype=np.double)
out = np.zeros(LEN, dtype=np.double)

%timeit np.sum(a.reshape(-1,2000), axis=1)
10 loops, best of 3: 25.1 ms per loop

%timeit accumulate.accumulate(a, b, out)
10 loops, best of 3: 64.6 ms per loop

would you have any suggestion for improving performance?


You might try some of the following:

  • In addition to the @cython.boundscheck(False) compiler directive, also try adding @cython.wraparound(False)

  • In your setup.py script, try adding in some optimization flags:

    ext_modules = [Extension("accumulate", ["accumulate.pyx"], extra_compile_args=["-O3",])]

  • Take a look at the .html file generated by cython -a accumulate.pyx to see if there are sections that are missing static typing or relying heavily on Python C-API calls:

    http://docs.cython.org/src/quickstart/cythonize.html#determining-where-to-add-types

  • Add a return statement at the end of the method. Currently it is doing a bunch of unnecessary error checking in your tight loop at i_el += 1 .

  • Not sure if it will make a difference but I tend to make loop counters cdef unsigned int rather than just int

  • You also might compare your code to numpy when section_lengths are unequal, since it will probably require a bit more than just a simple sum .


    In the nest for loop update out[i_bas] is slow, you can create a temporary variable to do the accumerate, and update out[i_bas] when nest for loop finished. The following code will be as fast as numpy version:

    import numpy as np
    cimport numpy as np
    
    ctypedef np.int_t DTYPE_int_t
    ctypedef np.double_t DTYPE_double_t
    
    cimport cython
    @cython.boundscheck(False)
    @cython.wraparound(False)
    def accumulate(
           np.ndarray[DTYPE_double_t, ndim=1] a not None,
           np.ndarray[DTYPE_int_t, ndim=1] section_lengths not None,
           np.ndarray[DTYPE_double_t, ndim=1] out not None,
           ):
        cdef int i_el, i_bas, sec_length, lenout
        cdef double tmp
        lenout = out.shape[0]
        i_el = 0
        for i_bas in range(lenout):
            tmp = 0
            for sec_length in range(section_lengths[i_bas]):
                tmp += a[i_el]
                i_el+=1
            out[i_bas] = tmp
    
    链接地址: http://www.djcxy.com/p/10792.html

    上一篇: PerformanceCounter.NextValue()抛出InvalidOperationException

    下一篇: cython numpy累积功能