cython numpy accumulate function

2018-06-03 02:38:30

I need to implement a function for summing the elements of an array with a variable section length. So,

a = np.arange(10)
section_lengths = np.array([3, 2, 4])
out = accumulate(a, section_lengths)
print out
array([  3.,   7.,  35.])

I attempted an implementation in cython here:

https://gist.github.com/2784725

for performance I am comparing to the pure numpy solution for the case where the section_lengths are all the same:

LEN = 10000
b = np.ones(LEN, dtype=np.int) * 2000
a = np.arange(np.sum(b), dtype=np.double)
out = np.zeros(LEN, dtype=np.double)

%timeit np.sum(a.reshape(-1,2000), axis=1)
10 loops, best of 3: 25.1 ms per loop

%timeit accumulate.accumulate(a, b, out)
10 loops, best of 3: 64.6 ms per loop

would you have any suggestion for improving performance?

You might try some of the following:

In addition to the @cython.boundscheck(False) compiler directive, also try adding @cython.wraparound(False)

In your setup.py script, try adding in some optimization flags:

ext_modules = [Extension("accumulate", ["accumulate.pyx"], extra_compile_args=["-O3",])]

Take a look at the .html file generated by cython -a accumulate.pyx to see if there are sections that are missing static typing or relying heavily on Python C-API calls:

http://docs.cython.org/src/quickstart/cythonize.html#determining-where-to-add-types

Add a return statement at the end of the method. Currently it is doing a bunch of unnecessary error checking in your tight loop at i_el += 1 .

Not sure if it will make a difference but I tend to make loop counters cdef unsigned int rather than just int

You also might compare your code to numpy when section_lengths are unequal, since it will probably require a bit more than just a simple sum .

In the nest for loop update out[i_bas] is slow, you can create a temporary variable to do the accumerate, and update out[i_bas] when nest for loop finished. The following code will be as fast as numpy version:

import numpy as np
cimport numpy as np

ctypedef np.int_t DTYPE_int_t
ctypedef np.double_t DTYPE_double_t

cimport cython
@cython.boundscheck(False)
@cython.wraparound(False)
def accumulate(
       np.ndarray[DTYPE_double_t, ndim=1] a not None,
       np.ndarray[DTYPE_int_t, ndim=1] section_lengths not None,
       np.ndarray[DTYPE_double_t, ndim=1] out not None,
       ):
    cdef int i_el, i_bas, sec_length, lenout
    cdef double tmp
    lenout = out.shape[0]
    i_el = 0
    for i_bas in range(lenout):
        tmp = 0
        for sec_length in range(section_lengths[i_bas]):
            tmp += a[i_el]
            i_el+=1
        out[i_bas] = tmp

链接地址: http://www.djcxy.com/p/10792.html

上一篇: PerformanceCounter.NextValue（）抛出InvalidOperationException

下一篇: cython numpy累积功能