在cython中创建小型数组需要花费大量的时间
我正在为numpy编写一个新的随机数生成器,当我遇到这种非常奇怪的行为时,根据任意分布产生随机数:
这是test.pyx
#cython: boundscheck=False
#cython: wraparound=False
import numpy as np
cimport numpy as np
cimport cython
def BareBones(np.ndarray[double, ndim=1] a,np.ndarray[double, ndim=1] u,r):
return u
def UntypedWithLoop(a,u,r):
cdef int i,j=0
for i in range(u.shape[0]):
j+=i
return u,j
def BSReplacement(np.ndarray[double, ndim=1] a, np.ndarray[double, ndim=1] u):
cdef np.ndarray[np.int_t, ndim=1] r=np.empty(u.shape[0],dtype=int)
cdef int i,j=0
for i in range(u.shape[0]):
j=i
return r
setup.py
from distutils.core import setup
from Cython.Build import cythonize
setup(name = "simple cython func",ext_modules = cythonize('test.pyx'),)
分析代码
#!/usr/bin/python
from __future__ import division
import subprocess
import timeit
#Compile the cython modules before importing them
subprocess.call(['python', 'setup.py', 'build_ext', '--inplace'])
sstr="""
import test
import numpy
u=numpy.random.random(10)
a=numpy.random.random(10)
a=numpy.cumsum(a)
a/=a[-1]
r=numpy.empty(10,int)
"""
print "binary search: creates an array[N] and performs N binary searches to fill it:n",timeit.timeit('numpy.searchsorted(a,u)',sstr)
print "Simple replacement for binary search:takes the same args as np.searchsorted and similarly returns a new array. this performs only one trivial operation per element:n",timeit.timeit('test.BSReplacement(a,u)',sstr)
print "barebones function doing nothing:",timeit.timeit('test.BareBones(a,u,r)',sstr)
print "Untyped inputs and doing N iterations:",timeit.timeit('test.UntypedWithLoop(a,u,r)',sstr)
print "time for just np.empty()",timeit.timeit('numpy.empty(10,int)',sstr)
二进制搜索实现按照len(u)*Log(len(a))
时间顺序执行。 微不足道的cython函数按len(u)
的顺序运行。 两者都返回一个len(u)的1D int数组。
然而,即使这样,没有计算微不足道的实现需要比numpy库中的全二值搜索更长的时间。 (它是用C编写的:https://github.com/numpy/numpy/blob/202e78d607515e0390cffb1898e11807f117b36a/numpy/core/src/multiarray/item_selection.c请参阅PyArray_SearchSorted)
结果是:
binary search: creates an array[N] and performs N binary searches to fill it:
1.15157485008
Simple replacement for binary search:takes the same args as np.searchsorted and similarly returns a new array. this performs only one trivial operation per element:
3.69442796707
barebones function doing nothing: 0.87496304512
Untyped inputs and doing N iterations: 0.244267940521
time for just np.empty() 1.0983929634
为什么np.empty()步骤需要花费太多时间? 我能做些什么来获得我可以返回的空数组?
C函数执行此操作,并运行一系列完整性检查,并在内部循环中使用更长的算法。 (我删除了所有的逻辑,除了循环本身从我的例子)
更新
原来有两个不同的问题:
np.ndarray[...]
也具有比接收无类型变量和迭代50次花费更多时间的大量开销。 50次迭代的结果:
binary search: 2.45336699486
Simple replacement:3.71126317978
barebones function doing nothing: 0.924916028976
Untyped inputs and doing N iterations: 0.316384077072
time for just np.empty() 1.04949498177
Cython列表中对此进行了讨论,可能会提供一些有用的建议:https://groups.google.com/forum/#!topic/cython-users/CwtU_jYADgM
通常,尽管我尝试在Cython之外分配小数组,请将它们传入并在随后的方法调用中重新使用它们。 我明白,这并不总是一种选择。
在Cython函数中创建np.empty
有一些开销,正如你已经看到的那样。 在这里,您将看到一个关于如何创建空数组并将其传递给Cython模块以便填充正确值的示例:
n=10
:
numpy.searchsorted: 1.30574745517
cython O(1): 3.28732016088
cython no array declaration 1.54710909596
n=100
:
numpy.searchsorted: 4.15200545373
cython O(1): 13.7273431067
cython no array declaration 11.4186086744
正如你已经指出的那样, numpy
版本比较好,因为它是O(len(u)*long(len(a)))
,这里的算法是O(len(u)*len(a))
...
我也尝试使用Memoryview,基本上通过double[:]
更改np.ndarray[double, ndim=1]
double[:]
,但在这种情况np.ndarray[double, ndim=1]
一个选项更快。
新的.pyx
文件是:
from __future__ import division
import numpy as np
cimport numpy as np
cimport cython
@cython.boundscheck(False)
@cython.wraparound(False)
def JustLoop(np.ndarray[double, ndim=1] a, np.ndarray[double, ndim=1] u,
np.ndarray[int, ndim=1] r):
cdef int i,j
for j in range(u.shape[0]):
if u[j] < a[0]:
r[j] = 0
continue
if u[j] > a[a.shape[0]-1]:
r[j] = a.shape[0]-1
continue
for i in range(1, a.shape[0]):
if u[j] >= a[i-1] and u[j] < a[i]:
r[j] = i
break
@cython.boundscheck(False)
@cython.wraparound(False)
def WithArray(np.ndarray[double, ndim=1] a, np.ndarray[double, ndim=1] u):
cdef np.ndarray[np.int_t, ndim=1] r=np.empty(u.shape[0],dtype=int)
cdef int i,j
for j in range(u.shape[0]):
if u[j] < a[0]:
r[j] = 0
continue
if u[j] > a[a.shape[0]-1]:
r[j] = a.shape[0]-1
continue
for i in range(1, a.shape[0]):
if u[j] >= a[i-1] and u[j] < a[i]:
r[j] = i
break
return r
新的.py
文件:
import numpy
import subprocess
import timeit
#Compile the cython modules before importing them
subprocess.call(['python', 'setup.py', 'build_ext', '--inplace'])
from test import *
sstr="""
import test
import numpy
u=numpy.random.random(10)
a=numpy.random.random(10)
a=numpy.cumsum(a)
a/=a[-1]
a.sort()
r = numpy.empty(u.shape[0], dtype=int)
"""
print "numpy.searchsorted:",timeit.timeit('numpy.searchsorted(a,u)',sstr)
print "cython O(1):",timeit.timeit('test.WithArray(a,u)',sstr)
print "cython no array declaration",timeit.timeit('test.JustLoop(a,u,r)',sstr)
链接地址: http://www.djcxy.com/p/15421.html
上一篇: creating small arrays in cython takes a humongous amount of time