在Cython中使用numpy数组掩码的性能

作为此问题的后续处理(感谢MSeifert的帮助),我想出了一个问题,即在传递被屏蔽的数组以更新val_dict之前,必须使用索引数组new_vals_idx来掩盖numpy数组new_values

对于MSeifert在旧帖子中提出的解决方案,我试图应用阵列掩码,但性能并不令人满意。
我用于下列示例的数组和字符串是:

import numpy as np
val_dict = {'a': 5.0, 'b': 18.8, 'c': -55/2}
for i in range(200):
    val_dict[str(i)] = i
    val_dict[i] = i**2

keys = ('b', 123, '89', 'c')  # dict keys to update
new_values = np.arange(1, 51, 1) / 1.0  # array with new values which has to be masked
new_vals_idx = np.array((0, 3, 5, -1))  # masking array
valarr = np.zeros((new_vals_idx.shape[0]))  # preallocation for masked array
length = new_vals_idx.shape[0]

为了使我的代码片段更容易与我的旧问题进行比较,我将坚持命名MSeifert的答案。 这些是我尝试从python / cython中获得最佳性能的原因(其他答案因性能太差而被排除在外):

def old_for(val_dict, keys, new_values, new_vals_idx, length):
    for i in range(length):
        val_dict[keys[i]] = new_values[new_vals_idx[i]]
%timeit old_for(val_dict, keys, new_values, new_vals_idx, length)
# 1000000 loops, best of 3: 1.6 µs per loop

def old_for_w_valarr(val_dict, keys, new_values, valarr, new_vals_idx, length):
    valarr = new_values[new_vals_idx]
    for i in range(length):
        val_dict[keys[i]] = valarr[i]
%timeit old_for_w_valarr(val_dict, keys, new_values, valarr, new_vals_idx, length)
# 100000 loops, best of 3: 2.33 µs per loop

def new2_w_valarr(val_dict, keys, new_values, valarr, new_vals_idx, length):
    valarr = new_values[new_vals_idx].tolist()
    for key, val in zip(keys, valarr):
        val_dict[key] = val
%timeit new2_w_valarr(val_dict, keys, new_values, valarr, new_vals_idx, length)
# 100000 loops, best of 3: 2.01 µs per loop

Cython功能:

%load_ext cython
%%cython
import numpy as np
cimport numpy as np
cpdef new3_cy(dict val_dict, tuple keys, double[:] new_values, int[:] new_vals_idx, Py_ssize_t length):
    cdef Py_ssize_t i
    cdef double val  # this gives about 10 µs speed boost compared to directly assigning it to val_dict
    for i in range(length):
        val = new_values[new_vals_idx[i]]
        val_dict[keys[i]] = val
%timeit new3_cy(val_dict, keys, new_values, new_vals_idx, length)
# 1000000 loops, best of 3: 1.38 µs per loop

cpdef new3_cy_mview(dict val_dict, tuple keys, double[:] new_values, int[:] new_vals_idx, Py_ssize_t length):
    cdef Py_ssize_t i
    cdef int[:] mview_idx = new_vals_idx
    cdef double [:] mview_vals = new_values
    for i in range(length):
        val_dict[keys[i]] = mview_vals[mview_idx[i]]
%timeit new3_cy_mview(val_dict, keys, new_values, new_vals_idx, length)
# 1000000 loops, best of 3: 1.38 µs per loop

# NOT WORKING:
cpdef new2_cy_mview(dict val_dict, tuple keys, double[:] new_values, int[:] new_vals_idx, Py_ssize_t length):
    cdef double [new_vals_idx] masked_vals = new_values
    for key, val in zip(keys, masked_vals.tolist()):
        val_dict[key] = val

cpdef new2_cy_mask(dict val_dict, tuple keys, double[:] new_values, valarr, int[:] new_vals_idx, Py_ssize_t length):
    valarr = new_values[new_vals_idx]
    for key, val in zip(keys, valarr.tolist()):
        val_dict[key] = val

Cython函数new3_cynew3_cy_mview似乎并不比old_for 。 传递valarr以避免函数内部的数组构造(因为它将被称为数百万次)甚至似乎会减慢它的速度。
使用new2_cy_mask中的new_vals_idx数组在new2_cy_mask中遮蔽会给我错误:'指定的memoryview无效索引,类型为int [:]'。 有没有像Py_ssize_t类型的索引数组?
试图在new2_cy_mview创建一个被屏蔽的new2_cy_mview会给我错误'Can not assign type'double [:]'to'double [__pyx_v_new_vals_idx]''。 有没有像掩盖的记忆体? 我无法找到有关此主题的信息...

比较时间结果和我的旧问题的结果,我猜测阵列掩蔽是大部分时间占用的过程。 因为它很可能已经在numpy中进行了高度优化,所以可能没有太多要做。 但是,减速是如此巨大,以至于必须有(希望)更好的方式来做到这一点。
任何帮助表示赞赏! 提前致谢!


在当前的构造中你可以做的一件事是关闭边界检查(如果它是安全的!)。 不会产生巨大的差异,但会有一些增量性能。

%%cython
import numpy as np
cimport numpy as np
cimport cython

@cython.boundscheck(False)
@cython.wraparound(False)
cpdef new4_cy(dict val_dict, tuple keys, double[:] new_values, int[:] new_vals_idx, Py_ssize_t length):
    cdef Py_ssize_t i
    cdef double val  # this gives about 10 µs speed boost compared to directly assigning it to val_dict
    for i in range(length):
        val = new_values[new_vals_idx[i]]
        val_dict[keys[i]] = val

In [36]: %timeit new3_cy(val_dict, keys, new_values, new_vals_idx, length)
1.76 µs ± 209 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [37]: %timeit new4_cy(val_dict, keys, new_values, new_vals_idx, length)
1.45 µs ± 31.5 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
链接地址: http://www.djcxy.com/p/87929.html

上一篇: Performance of numpy array masking in Cython

下一篇: memory leak calling cython function with large numpy array parameters?