cython memoryview slower than expected
I've started using memoryviews in cython to access numpy arrays. One of the various advantages they have is that they are considerably faster than the old numpy buffer support: http://docs.cython.org/src/userguide/memoryviews.html#comparison-to-the-old-buffer-support
However, I have an example where the old numpy buffer support is faster than memoryviews! How can this be?! I wonder if I'm using memoryviews correctly?
This is my test:
import numpy as np
cimport numpy as np
cimport cython
@cython.boundscheck(False)
@cython.wraparound(False)
cpdef np.ndarray[np.uint8_t, ndim=2] image_box1(np.ndarray[np.uint8_t, ndim=2] im,
np.ndarray[np.float64_t, ndim=1] pd,
int box_half_size):
cdef unsigned int p0 = <int>(pd[0] + 0.5)
cdef unsigned int p1 = <int>(pd[1] + 0.5)
cdef unsigned int top = p1 - box_half_size
cdef unsigned int left = p0 - box_half_size
cdef unsigned int bottom = p1 + box_half_size
cdef unsigned int right = p0 + box_half_size
cdef np.ndarray[np.uint8_t, ndim=2] box = im[top:bottom, left:right]
return box
@cython.boundscheck(False)
@cython.wraparound(False)
cpdef np.uint8_t[:, ::1] image_box2(np.uint8_t[:, ::1] im,
np.float64_t[:] pd,
int box_half_size):
cdef unsigned int p0 = <int>(pd[0] + 0.5)
cdef unsigned int p1 = <int>(pd[1] + 0.5)
cdef unsigned int top = p1 - box_half_size
cdef unsigned int left = p0 - box_half_size
cdef unsigned int bottom = p1 + box_half_size
cdef unsigned int right = p0 + box_half_size
cdef np.uint8_t[:, ::1] box = im[top:bottom, left:right]
return box
The timing results are:
image_box1: typed numpy: 100000 loops, best of 3: 11.2 us per loop
image_box2: memoryview: 100000 loops, best of 3: 18.1 us per loop
These measurements are done from IPython using %timeit image_box1(im, pd, box_half_size)
Alright! I found the problem. As seberg pointed out the memoryviews appeared slower because the measurement included the automatic conversion from numpy array to memoryview.
I used the following function to measure the times from within the cython module:
def test(params):
import timeit
im = params[0]
pd = params[1]
box_half_size = params[2]
t1 = timeit.Timer(lambda: image_box1(im, pd, box_half_size))
print 'image_box1: typed numpy:'
print min(t1.repeat(3, 10))
cdef np.uint8_t[:, ::1] im2 = im
cdef np.float64_t[:] pd2 = pd
t2 = timeit.Timer(lambda: image_box2(im2, pd2, box_half_size))
print 'image_box2: memoryview:'
print min(t2.repeat(3, 10))
result:
image_box1: typed numpy: 9.07607864065e-05
image_box2: memoryview: 5.81799904467e-05
So memoryviews are indeed faster!
Note that I converted im and pd to memoryviews before calling image_box2. If I don't do this step and I pass im and pd directly, then image_box2 is slower:
image_box1: typed numpy: 9.12262257771e-05
image_box2: memoryview: 0.000185245087778
链接地址: http://www.djcxy.com/p/62922.html