调整不包含NaN的二维numpy阵列的大小

我正在尝试调整给定因子的2D numpy数组的大小,以便在输出中获得更小的数组。

该阵列从图像文件中读取,其中一些值应该是NaN(不是来自numpy的数字,np.nan):它是来自卫星的遥感测量结果,并且仅测量了一些像素。

我发现的适合的包是scypy.misc.imresize,但是包含NaN的输出数组中的每个像素都被设置为NaN,即使原始像素中有一些有效数据被内插在一起。

我的解决方案附在这里,我所做的基本上是:

  • 根据原始数组形状和所需的缩减因子创建一个新数组
  • 创建一个索引数组来寻址原始数组中所有像素,以便对新像素中的每个像素求平均值
  • 循环遍历新阵列像素并平均所有非NaN像素以获得新阵列像素值; 它只有NaN,输出将是NaN。
  • 我正计划在不同的输出(平均值,中值,输入像素的标准偏差等)之间添加关键字。

    它按预期工作,但在〜1Mpx的图像上大约需要3秒。 由于我缺乏Python的经验,我正在寻求改进。

    有没有人有建议如何更好,更有效地做到这一点?

    有谁知道已经实现了所有这些东西的图书馆?

    谢谢。

    这里有一个用下面代码生成的随机像素输入示例输出:

    随机像素输入示例输出(见代码)

    import numpy as np
    import pylab as plt
    from scipy import misc
    
    def resize_2d_nonan(array,factor):
        """
        Resize a 2D array by different factor on two axis sipping NaN values.
        If a new pixel contains only NaN, it will be set to NaN
    
    
        Parameters
        ----------
    
        array : 2D np array
    
        factor : int or tuple. If int x and y factor wil be the same
    
        Returns
        -------
        array : 2D np array scaled by factor
    
        Created on Mon Jan 27 15:21:25 2014
    
        @author: damo_ma
        """
        xsize, ysize = array.shape
    
        if isinstance(factor,int):
            factor_x = factor
            factor_y = factor
        elif isinstance(factor,tuple):
            factor_x , factor_y = factor[0], factor[1]
        else:
            raise NameError('Factor must be a tuple (x,y) or an integer')
    
        if not (xsize %factor_x == 0 or ysize % factor_y == 0) :
            raise NameError('Factors must be intger multiple of array shape')
    
        new_xsize, new_ysize = xsize/factor_x, ysize/factor_y
    
        new_array = np.empty([new_xsize, new_ysize])
        new_array[:] = np.nan # this saves us an assignment in the loop below
    
        # submatrix indexes : is the average box on the original matrix
        subrow, subcol  = np.indices((factor_x, factor_y))
    
         # new matrix indexs
        row, col  = np.indices((new_xsize, new_ysize))
    
        # some output for testing
        #for i, j, ind in zip(row.reshape(-1), col.reshape(-1),range(row.size)) :
        #    print '----------------------------------------------'
        #    print 'i: %i, j: %i, ind: %i ' % (i, j, ind)    
        #    print 'subrow+i*new_ysize, subcol+j*new_xsize :'    
        #    print i,'*',new_xsize,'=',i*factor_x
        #    print j,'*',new_ysize,'=',j*factor_y
        #    print subrow+i*factor_x,subcol+j*factor_y
        #    print '---'
        #    print 'array[subrow+i*factor_x,subcol+j*factor_y] : '    
        #    print array[subrow+i*factor_x,subcol+j*factor_y]
    
        for i, j, ind in zip(row.reshape(-1), col.reshape(-1),range(row.size)) :
            # define the small sub_matrix as view of input matrix subset
            sub_matrix = array[subrow+i*factor_x,subcol+j*factor_y]
            # modified from any(a) and all(a) to a.any() and a.all()
            # see https://stackoverflow.com/a/10063039/1435167
            if not (np.isnan(sub_matrix)).all(): # if we haven't all NaN
                if (np.isnan(sub_matrix)).any(): # if we haven no NaN at all
                    msub_matrix = np.ma.masked_array(sub_matrix,np.isnan(sub_matrix))
                    (new_array.reshape(-1))[ind] = np.mean(msub_matrix)
                else: # if we haven some NaN
                    (new_array.reshape(-1))[ind] = np.mean(sub_matrix)
            # the case assign NaN if we have all NaN is missing due 
            # to the standard values of new_array
    
        return new_array
    
    
    row , cols = 6, 4
    
    a = 10*np.random.random_sample((row , cols))
    a[0:3,0:2] = np.nan
    a[0,2] = np.nan
    
    factor_x = 2
    factor_y = 2
    a_misc = misc.imresize(a, .5, interp='nearest', mode='F')
    a_2d_nonan = resize_2d_nonan(a,(factor_x,factor_y))
    
    print a
    print
    print a_misc
    print
    print a_2d_nonan
    
    plt.subplot(131)
    plt.imshow(a,interpolation='nearest')
    plt.title('original')
    plt.xticks(arange(a.shape[1]))
    plt.yticks(arange(a.shape[0]))
    plt.subplot(132)
    plt.imshow(a_misc,interpolation='nearest')
    plt.title('scipy.misc')
    plt.xticks(arange(a_misc.shape[1]))
    plt.yticks(arange(a_misc.shape[0]))
    plt.subplot(133)
    plt.imshow(a_2d_nonan,interpolation='nearest')
    plt.title('my.func')
    plt.xticks(arange(a_2d_nonan.shape[1]))
    plt.yticks(arange(a_2d_nonan.shape[0]))
    

    编辑

    我添加了一些修改来解决ChrisProsser评论。

    如果我用其他值代替NaN,假设说非NaN像素的平均值,它将影响所有后续计算:重新采样的原始数组和重新采样的数组之间的差异代替了NaN,表明2个像素​​的值改变了它们的值。

    我的目标是简单地跳过所有的NaN像素。

    # substitute NaN with the average value 
    
    ind_nonan , ind_nan = np.where(np.isnan(a) == False), np.where(np.isnan(a) == True)
    a_substitute = np.copy(a)
    
    a_substitute[ind_nan] = np.mean(a_substitute[ind_nonan]) # substitute the NaN with average on the not-Nan
    
    a_substitute_misc = misc.imresize(a_substitute, .5, interp='nearest', mode='F')
    a_substitute_2d_nonan = resize_2d_nonan(a_substitute,(factor_x,factor_y))
    
    print a_2d_nonan-a_substitute_2d_nonan
    
    [[        nan -0.02296697]
     [ 0.23143208  0.        ]
     [ 0.          0.        ]]
    

    在这里输入图像描述

    **第二次编辑**

    为了解决Hooked的答案,我添加了一些额外的代码。 这是一个迭代的想法,遗憾的是它插值了应该是“空”(NaN)的像素的新值,并且对于我的小例子来说,生成的NaN比好的值要多。

    X , Y  = np.indices((row , cols))
    X_new , Y_new  = np.indices((row/factor_x , cols/factor_y))
    
    from scipy.interpolate import CloughTocher2DInterpolator as intp
    C = intp((X[ind_nonan],Y[ind_nonan]),a[ind_nonan])
    
    a_interp = C(X_new , Y_new)
    
    print a
    print
    print a_interp
    
    [[        nan,         nan],
     [        nan,         nan],
     [        nan,  6.32826577]])
    

    在这里输入图像描述


    您正在操作阵列的小窗口。 除了循环访问数组以创建窗口外,还可以通过操纵其大步骤来高效地重组数组。 numpy库提供as_strided()函数来帮助解决这个问题。 SciPy CookBook Stride技巧为生命游戏提供了一个例子。

    下面将使用在Numpy的Efficient Overlapping Windows中找到的广义滑动窗口函数 - 我将在最后包含它。

    确定新阵列的形状:

    rows, cols = a.shape
    new_shape = rows / 2, cols / 2
    

    将数组重组为您需要的窗口,并创建一个标识NaN的索引数组:

    # 2x2 windows of the original array
    windows = sliding_window(a, (2,2))
    # make a windowed boolean array for indexing
    notNan = sliding_window(np.logical_not(np.isnan(a)), (2,2))
    

    可以使用列表理解或生成器表达式来创建新数组。

    # using a list comprehension
    # make a list of the means of the windows, disregarding the Nan's
    means = [window[index].mean() for window, index in zip(windows, notNan)]
    new_array = np.array(means).reshape(new_shape)
    
    # generator expression
    # produces the means of the windows, disregarding the Nan's
    means = (window[index].mean() for window, index in zip(windows, notNan))
    new_array = np.fromiter(means, dtype = np.float32).reshape(new_shape)
    

    生成器表达式应该节省内存。 如果内存有问题,使用itertools.izip()而不是`zip也应该有帮助。 我只是使用列表理解你的解决方案。

    你的功能:

    def resize_2d_nonan(array,factor):
        """
        Resize a 2D array by different factor on two axis skipping NaN values.
        If a new pixel contains only NaN, it will be set to NaN
    
        Parameters
        ----------
        array : 2D np array
    
        factor : int or tuple. If int x and y factor wil be the same
    
        Returns
        -------
        array : 2D np array scaled by factor
    
        Created on Mon Jan 27 15:21:25 2014
    
        @author: damo_ma
        """
        xsize, ysize = array.shape
    
        if isinstance(factor,int):
            factor_x = factor
            factor_y = factor
            window_size = factor, factor
        elif isinstance(factor,tuple):
            factor_x , factor_y = factor
            window_size = factor
        else:
            raise NameError('Factor must be a tuple (x,y) or an integer')
    
        if (xsize % factor_x or ysize % factor_y) :
            raise NameError('Factors must be integer multiple of array shape')
    
        new_shape = xsize / factor_x, ysize / factor_y
    
        # non-overlapping windows of the original array
        windows = sliding_window(a, window_size)
        # windowed boolean array for indexing
        notNan = sliding_window(np.logical_not(np.isnan(a)), window_size)
    
        #list of the means of the windows, disregarding the Nan's
        means = [window[index].mean() for window, index in zip(windows, notNan)]
        # new array
        new_array = np.array(means).reshape(new_shape)
    
        return new_array
    

    我没有和原来的功能做过任何时间比较,但它应该更快。

    我在这里看到的很多解决方案都是通过向量化操作来提高速度/效率 - 我并不完全掌握它,也不知道它是否可以应用于您的问题。 搜索窗口,数组,移动平均线,矢量化和numpy应该产生类似的问题和答案供参考。

    来自Efficient Overlapping Windows with Numpy的sliding_window()

    import numpy as np
    from numpy.lib.stride_tricks import as_strided as ast
    from itertools import product
    
    def norm_shape(shape):
        '''
        Normalize numpy array shapes so they're always expressed as a tuple, 
        even for one-dimensional shapes.
    
        Parameters
            shape - an int, or a tuple of ints
    
        Returns
            a shape tuple
        '''
        try:
            i = int(shape)
            return (i,)
        except TypeError:
            # shape was not a number
            pass
    
        try:
            t = tuple(shape)
            return t
        except TypeError:
            # shape was not iterable
            pass
    
        raise TypeError('shape must be an int, or a tuple of ints')
    
    
    def sliding_window(a,ws,ss = None,flatten = True):
        '''
        Return a sliding window over a in any number of dimensions
    
        Parameters:
            a  - an n-dimensional numpy array
            ws - an int (a is 1D) or tuple (a is 2D or greater) representing the size 
                 of each dimension of the window
            ss - an int (a is 1D) or tuple (a is 2D or greater) representing the 
                 amount to slide the window in each dimension. If not specified, it
                 defaults to ws.
            flatten - if True, all slices are flattened, otherwise, there is an 
                      extra dimension for each dimension of the input.
    
        Returns
            an array containing each n-dimensional window from a
        '''
    
        if None is ss:
            # ss was not provided. the windows will not overlap in any direction.
            ss = ws
        ws = norm_shape(ws)
        ss = norm_shape(ss)
    
        # convert ws, ss, and a.shape to numpy arrays so that we can do math in every 
        # dimension at once.
        ws = np.array(ws)
        ss = np.array(ss)
        shape = np.array(a.shape)
    
    
        # ensure that ws, ss, and a.shape all have the same number of dimensions
        ls = [len(shape),len(ws),len(ss)]
        if 1 != len(set(ls)):
            raise ValueError(
            'a.shape, ws and ss must all have the same length. They were %s' % str(ls))
    
        # ensure that ws is smaller than a in every dimension
        if np.any(ws > shape):
            raise ValueError(
            'ws cannot be larger than a in any dimension.
     a.shape was %s and ws was %s' % (str(a.shape),str(ws)))
    
        # how many slices will there be in each dimension?
        newshape = norm_shape(((shape - ws) // ss) + 1)
        # the shape of the strided array will be the number of slices in each dimension
        # plus the shape of the window (tuple addition)
        newshape += norm_shape(ws)
        # the strides tuple will be the array's strides multiplied by step size, plus
        # the array's strides (tuple addition)
        newstrides = norm_shape(np.array(a.strides) * ss) + a.strides
        strided = ast(a,shape = newshape,strides = newstrides)
        if not flatten:
            return strided
    
        # Collapse strided so that it has one more dimension than the window.  I.e.,
        # the new array is a flat list of slices.
        meat = len(ws) if ws.shape else 0
        firstdim = (np.product(newshape[:-meat]),) if ws.shape else ()
        dim = firstdim + (newshape[-meat:])
        # remove any dimensions with size 1
        dim = filter(lambda i : i != 1,dim)
        return strided.reshape(dim)
    

    在不同的网格上使用scipy.interpolate对点进行插值。 下面我已经展示了一个立方体插入器,它比较慢但可能更准确。 你会注意到这个函数没有角落像素,那么你可以使用线性或最近邻居插值来处理这些最后的值。

    在这里输入图像描述

    import numpy as np
    import pylab as plt
    
    # Test data
    row = np.linspace(-3,3,50)
    X,Y = np.meshgrid(row,row)
    Z = np.sqrt(X**2+Y**2) + np.cos(Y) 
    
    # Make some dead pixels, favor an edge
    dead = np.random.random(Z.shape)
    dead = (dead*X>.7)
    Z[dead] =np.nan
    
    from scipy.interpolate import CloughTocher2DInterpolator as intp
    C = intp((X[~dead],Y[~dead]),Z[~dead])
    
    new_row = np.linspace(-3,3,25)
    xi,yi   = np.meshgrid(new_row,new_row)
    zi = C(xi,yi)
    
    plt.subplot(121)
    plt.title("Original signal 50x50")
    plt.imshow(Z,interpolation='nearest')
    
    plt.subplot(122)
    plt.title("Interpolated signal 25x25")
    plt.imshow(zi,interpolation='nearest')
    
    plt.show()
    
    链接地址: http://www.djcxy.com/p/18907.html

    上一篇: resize a 2D numpy array excluding NaN

    下一篇: ColdFusion Execution Time Accuracy