加快Python / Cython循环。
我试图让python中的循环尽可能快地运行。 所以我已经潜入NumPy和Cython。 这是原始的Python代码:
def calculate_bsf_u_loop(uvel,dy,dz):
"""
Calculate barotropic stream function from zonal velocity
uvel (t,z,y,x)
dy (y,x)
dz (t,z,y,x)
bsf (t,y,x)
"""
nt = uvel.shape[0]
nz = uvel.shape[1]
ny = uvel.shape[2]
nx = uvel.shape[3]
bsf = np.zeros((nt,ny,nx))
for jn in range(0,nt):
for jk in range(0,nz):
for jj in range(0,ny):
for ji in range(0,nx):
bsf[jn,jj,ji] = bsf[jn,jj,ji] + uvel[jn,jk,jj,ji] * dz[jn,jk,jj,ji] * dy[jj,ji]
return bsf
这只是k指数的总和。 数组大小为nt = 12,nz = 75,ny = 559,nx = 1442,因此〜725百万个元素。 这花了68秒。 现在,我已经在cython中完成了
import numpy as np
cimport numpy as np
cimport cython
@cython.boundscheck(False) # turn off bounds-checking for entire function
@cython.wraparound(False) # turn off negative index wrapping for entire function
## Use cpdef instead of def
## Define types for arrays
cpdef calculate_bsf_u_loop(np.ndarray[np.float64_t, ndim=4] uvel, np.ndarray[np.float64_t, ndim=2] dy, np.ndarray[np.float64_t, ndim=4] dz):
"""
Calculate barotropic stream function from zonal velocity
uvel (t,z,y,x)
dy (y,x)
dz (t,z,y,x)
bsf (t,y,x)
"""
## cdef the constants
cdef int nt = uvel.shape[0]
cdef int nz = uvel.shape[1]
cdef int ny = uvel.shape[2]
cdef int nx = uvel.shape[3]
## cdef loop indices
cdef ji,jj,jk,jn
## cdef. Note that the cdef is followed by cython type
## but the np.zeros function as python (numpy) type
cdef np.ndarray[np.float64_t, ndim=3] bsf = np.zeros([nt,ny,nx], dtype=np.float64)
for jn in xrange(0,nt):
for jk in xrange(0,nz):
for jj in xrange(0,ny):
for ji in xrange(0,nx):
bsf[jn,jj,ji] += uvel[jn,jk,jj,ji] * dz[jn,jk,jj,ji] * dy[jj,ji]
return bsf
那花了49秒。 然而,交换循环
for jn in range(0,nt):
for jk in range(0,nz):
bsf[jn,:,:] = bsf[jn,:,:] + uvel[jn,jk,:,:] * dz[jn,jk,:,:] * dy[:,:]
只需要0.29秒! 不幸的是,我不能在我的完整代码中做到这一点。
为什么NumPy比Cython循环更快地切片? 我认为NumPy速度很快,因为它是在引擎盖下的Cython。 所以他们不应该有相似的速度?
正如你所看到的,我已经禁用了cython中的边界检查,并且我还使用“快速数学”进行了编译。 但是,这只能提供很小的加速。 无论如何要得到一个与NumPy切片类似的循环,或者循环总是慢于切片?
任何帮助是极大的赞赏! /乔金 -
该代码尖叫numpy.einsum's
干预,因为您正在做4D
产品阵列的第二轴上的elementwise-multiplication
和sum-reduction
,这主要是numpy.einsum
以高效的方式进行的。 为了解决你的情况,你可以通过两种方式使用numpy.einsum
-
bsf = np.einsum('ijkl,ijkl,kl->ikl',uvel,dz,dy)
bsf = np.einsum('ijkl,ijkl->ikl',uvel,dz)*dy
运行时测试和验证输出 -
In [100]: # Take a (1/5)th of original input shapes
...: original_shape = [12,75, 559,1442]
...: m,n,p,q = (np.array(original_shape)/5).astype(int)
...:
...: # Generate random arrays with given shapes
...: uvel = np.random.rand(m,n,p,q)
...: dy = np.random.rand(p,q)
...: dz = np.random.rand(m,n,p,q)
...:
In [101]: bsf = calculate_bsf_u_loop(uvel,dy,dz)
In [102]: print(np.allclose(bsf,np.einsum('ijkl,ijkl,kl->ikl',uvel,dz,dy)))
True
In [103]: print(np.allclose(bsf,np.einsum('ijkl,ijkl->ikl',uvel,dz)*dy))
True
In [104]: %timeit calculate_bsf_u_loop(uvel,dy,dz)
1 loops, best of 3: 2.16 s per loop
In [105]: %timeit np.einsum('ijkl,ijkl,kl->ikl',uvel,dz,dy)
100 loops, best of 3: 3.94 ms per loop
In [106]: %timeit np.einsum('ijkl,ijkl->ikl',uvel,dz)*dy
100 loops, best of 3: 3.96 ms per loo
链接地址: http://www.djcxy.com/p/88965.html