cython memoryview比预期慢
我已经开始在cython中使用memoryviews来访问numpy数组。 他们拥有的各种优点之一是它们比旧的numpy缓冲区支持要快得多:http://docs.cython.org/src/userguide/memoryviews.html#comparison-to-the-old-buffer-support
但是,我有一个例子,其中旧的numpy缓冲区支持比memoryviews更快! 怎么会这样?! 我想知道我是否正确使用了记忆体?
这是我的测试:
import numpy as np
cimport numpy as np
cimport cython
@cython.boundscheck(False)
@cython.wraparound(False)
cpdef np.ndarray[np.uint8_t, ndim=2] image_box1(np.ndarray[np.uint8_t, ndim=2] im,
np.ndarray[np.float64_t, ndim=1] pd,
int box_half_size):
cdef unsigned int p0 = <int>(pd[0] + 0.5)
cdef unsigned int p1 = <int>(pd[1] + 0.5)
cdef unsigned int top = p1 - box_half_size
cdef unsigned int left = p0 - box_half_size
cdef unsigned int bottom = p1 + box_half_size
cdef unsigned int right = p0 + box_half_size
cdef np.ndarray[np.uint8_t, ndim=2] box = im[top:bottom, left:right]
return box
@cython.boundscheck(False)
@cython.wraparound(False)
cpdef np.uint8_t[:, ::1] image_box2(np.uint8_t[:, ::1] im,
np.float64_t[:] pd,
int box_half_size):
cdef unsigned int p0 = <int>(pd[0] + 0.5)
cdef unsigned int p1 = <int>(pd[1] + 0.5)
cdef unsigned int top = p1 - box_half_size
cdef unsigned int left = p0 - box_half_size
cdef unsigned int bottom = p1 + box_half_size
cdef unsigned int right = p0 + box_half_size
cdef np.uint8_t[:, ::1] box = im[top:bottom, left:right]
return box
计时结果是:
image_box1:输入numpy:100000个循环,最好是每个循环3:11.2 us
image_box2:memoryview:100000循环,每个循环最好3:18.1 us
这些测量是使用%timeit image_box1(im,pd,box_half_size)从IPython完成的,
好的! 我发现了这个问题。 正如seberg指出的那样,memoryviews显得比较慢,因为测量包括从numpy数组到memoryview的自动转换。
我使用以下函数来测量cython模块中的时间:
def test(params):
import timeit
im = params[0]
pd = params[1]
box_half_size = params[2]
t1 = timeit.Timer(lambda: image_box1(im, pd, box_half_size))
print 'image_box1: typed numpy:'
print min(t1.repeat(3, 10))
cdef np.uint8_t[:, ::1] im2 = im
cdef np.float64_t[:] pd2 = pd
t2 = timeit.Timer(lambda: image_box2(im2, pd2, box_half_size))
print 'image_box2: memoryview:'
print min(t2.repeat(3, 10))
结果:
image_box1:输入numpy:9.07607864065e-05
image_box2:memoryview:5.81799904467e-05
所以记忆体确实更快!
请注意,我在调用image_box2之前将im和pd转换为内存视图。 如果我不这样做,并且我直接传递im和pd,那么image_box2会更慢:
image_box1:输入numpy:9.12262257771e-05
image_box2:memoryview:0.000185245087778
链接地址: http://www.djcxy.com/p/62921.html