resize a 2D numpy array excluding NaN

I'm trying to resize a 2D numpy array of a given factor, obtaining a smaller array in output.

The array is read from an image file and some of the values should be NaN (Not a Number, np.nan from numpy): it is the result of remote sensing measurements from satellite and simply some pixels weren't measured.

The suitable package I found for this is scypy.misc.imresize, but each pixel in the output array containing a NaN is set to NaN, even if there are some valid data in the original pixels interpolated together.

My solution is appended here, what I've done is essentially :

  • create a new array based on the original array shape and the desired reduction factor
  • create an index array to address all the pixels of the original array to be averaged for each pixel in the new
  • cycle through the new array pixels and average all the not-NaN pixel to obtain the new array pixel value; it there are only NaN, the output will be NaN.
  • I'm planning to add keyword to choice between different output (average, median, standard deviation of the input pixels and so on).

    It is working as expected, but on a ~1Mpx image it takes around 3 seconds. Due to my lack of experience in python I'm searching for improvements.

    Do anyone have suggestion how to do it better and more efficiently?

    Do anyone know a library that already implements all that stuff?

    Thanks.

    Here you have an example output for random pixel input generated with the code here below:

    随机像素输入示例输出(见代码)

    import numpy as np
    import pylab as plt
    from scipy import misc
    
    def resize_2d_nonan(array,factor):
        """
        Resize a 2D array by different factor on two axis sipping NaN values.
        If a new pixel contains only NaN, it will be set to NaN
    
    
        Parameters
        ----------
    
        array : 2D np array
    
        factor : int or tuple. If int x and y factor wil be the same
    
        Returns
        -------
        array : 2D np array scaled by factor
    
        Created on Mon Jan 27 15:21:25 2014
    
        @author: damo_ma
        """
        xsize, ysize = array.shape
    
        if isinstance(factor,int):
            factor_x = factor
            factor_y = factor
        elif isinstance(factor,tuple):
            factor_x , factor_y = factor[0], factor[1]
        else:
            raise NameError('Factor must be a tuple (x,y) or an integer')
    
        if not (xsize %factor_x == 0 or ysize % factor_y == 0) :
            raise NameError('Factors must be intger multiple of array shape')
    
        new_xsize, new_ysize = xsize/factor_x, ysize/factor_y
    
        new_array = np.empty([new_xsize, new_ysize])
        new_array[:] = np.nan # this saves us an assignment in the loop below
    
        # submatrix indexes : is the average box on the original matrix
        subrow, subcol  = np.indices((factor_x, factor_y))
    
         # new matrix indexs
        row, col  = np.indices((new_xsize, new_ysize))
    
        # some output for testing
        #for i, j, ind in zip(row.reshape(-1), col.reshape(-1),range(row.size)) :
        #    print '----------------------------------------------'
        #    print 'i: %i, j: %i, ind: %i ' % (i, j, ind)    
        #    print 'subrow+i*new_ysize, subcol+j*new_xsize :'    
        #    print i,'*',new_xsize,'=',i*factor_x
        #    print j,'*',new_ysize,'=',j*factor_y
        #    print subrow+i*factor_x,subcol+j*factor_y
        #    print '---'
        #    print 'array[subrow+i*factor_x,subcol+j*factor_y] : '    
        #    print array[subrow+i*factor_x,subcol+j*factor_y]
    
        for i, j, ind in zip(row.reshape(-1), col.reshape(-1),range(row.size)) :
            # define the small sub_matrix as view of input matrix subset
            sub_matrix = array[subrow+i*factor_x,subcol+j*factor_y]
            # modified from any(a) and all(a) to a.any() and a.all()
            # see https://stackoverflow.com/a/10063039/1435167
            if not (np.isnan(sub_matrix)).all(): # if we haven't all NaN
                if (np.isnan(sub_matrix)).any(): # if we haven no NaN at all
                    msub_matrix = np.ma.masked_array(sub_matrix,np.isnan(sub_matrix))
                    (new_array.reshape(-1))[ind] = np.mean(msub_matrix)
                else: # if we haven some NaN
                    (new_array.reshape(-1))[ind] = np.mean(sub_matrix)
            # the case assign NaN if we have all NaN is missing due 
            # to the standard values of new_array
    
        return new_array
    
    
    row , cols = 6, 4
    
    a = 10*np.random.random_sample((row , cols))
    a[0:3,0:2] = np.nan
    a[0,2] = np.nan
    
    factor_x = 2
    factor_y = 2
    a_misc = misc.imresize(a, .5, interp='nearest', mode='F')
    a_2d_nonan = resize_2d_nonan(a,(factor_x,factor_y))
    
    print a
    print
    print a_misc
    print
    print a_2d_nonan
    
    plt.subplot(131)
    plt.imshow(a,interpolation='nearest')
    plt.title('original')
    plt.xticks(arange(a.shape[1]))
    plt.yticks(arange(a.shape[0]))
    plt.subplot(132)
    plt.imshow(a_misc,interpolation='nearest')
    plt.title('scipy.misc')
    plt.xticks(arange(a_misc.shape[1]))
    plt.yticks(arange(a_misc.shape[0]))
    plt.subplot(133)
    plt.imshow(a_2d_nonan,interpolation='nearest')
    plt.title('my.func')
    plt.xticks(arange(a_2d_nonan.shape[1]))
    plt.yticks(arange(a_2d_nonan.shape[0]))
    

    EDIT

    I add some modification to address ChrisProsser comment.

    If I substitute the NaN with some other value, let say the average of the not-NaN pixels, it will affect all the subsequent calculation: the difference between the resampled original array and the resampled array with NaN substituted shows that 2 pixels changed their values.

    My goal is simply skip all the NaN pixels.

    # substitute NaN with the average value 
    
    ind_nonan , ind_nan = np.where(np.isnan(a) == False), np.where(np.isnan(a) == True)
    a_substitute = np.copy(a)
    
    a_substitute[ind_nan] = np.mean(a_substitute[ind_nonan]) # substitute the NaN with average on the not-Nan
    
    a_substitute_misc = misc.imresize(a_substitute, .5, interp='nearest', mode='F')
    a_substitute_2d_nonan = resize_2d_nonan(a_substitute,(factor_x,factor_y))
    
    print a_2d_nonan-a_substitute_2d_nonan
    
    [[        nan -0.02296697]
     [ 0.23143208  0.        ]
     [ 0.          0.        ]]
    

    在这里输入图像描述

    ** 2nd EDIT**

    To address the Hooked's answer I put some additional code. It is an iteresting idea, sadly it interpolates new values over pixels that should be "empty" (NaN) and for my small example generate more NaN than good values.

    X , Y  = np.indices((row , cols))
    X_new , Y_new  = np.indices((row/factor_x , cols/factor_y))
    
    from scipy.interpolate import CloughTocher2DInterpolator as intp
    C = intp((X[ind_nonan],Y[ind_nonan]),a[ind_nonan])
    
    a_interp = C(X_new , Y_new)
    
    print a
    print
    print a_interp
    
    [[        nan,         nan],
     [        nan,         nan],
     [        nan,  6.32826577]])
    

    在这里输入图像描述


    You are operating on small windows of the array. Instead of looping through the array to make the windows, the array can be efficiently restructured by manipulating its strides. The numpy library provides the as_strided() function to help with that. An example is provided in the SciPy CookBook Stride tricks for the Game of Life.

    The following will use a generalized sliding window function found at Efficient Overlapping Windows with Numpy - I will include it at the end.

    Determine the shape of the new array:

    rows, cols = a.shape
    new_shape = rows / 2, cols / 2
    

    Restructure the array into the windows you need, and create an indexing array identifying NaNs:

    # 2x2 windows of the original array
    windows = sliding_window(a, (2,2))
    # make a windowed boolean array for indexing
    notNan = sliding_window(np.logical_not(np.isnan(a)), (2,2))
    

    The new array can be made using a list comprehension or a generator expression.

    # using a list comprehension
    # make a list of the means of the windows, disregarding the Nan's
    means = [window[index].mean() for window, index in zip(windows, notNan)]
    new_array = np.array(means).reshape(new_shape)
    
    # generator expression
    # produces the means of the windows, disregarding the Nan's
    means = (window[index].mean() for window, index in zip(windows, notNan))
    new_array = np.fromiter(means, dtype = np.float32).reshape(new_shape)
    

    The generator expression should conserve memory. Using itertools.izip() instead of `zip should also help if memory is a problem. I just used the list comprehension for your solution.

    Your function:

    def resize_2d_nonan(array,factor):
        """
        Resize a 2D array by different factor on two axis skipping NaN values.
        If a new pixel contains only NaN, it will be set to NaN
    
        Parameters
        ----------
        array : 2D np array
    
        factor : int or tuple. If int x and y factor wil be the same
    
        Returns
        -------
        array : 2D np array scaled by factor
    
        Created on Mon Jan 27 15:21:25 2014
    
        @author: damo_ma
        """
        xsize, ysize = array.shape
    
        if isinstance(factor,int):
            factor_x = factor
            factor_y = factor
            window_size = factor, factor
        elif isinstance(factor,tuple):
            factor_x , factor_y = factor
            window_size = factor
        else:
            raise NameError('Factor must be a tuple (x,y) or an integer')
    
        if (xsize % factor_x or ysize % factor_y) :
            raise NameError('Factors must be integer multiple of array shape')
    
        new_shape = xsize / factor_x, ysize / factor_y
    
        # non-overlapping windows of the original array
        windows = sliding_window(a, window_size)
        # windowed boolean array for indexing
        notNan = sliding_window(np.logical_not(np.isnan(a)), window_size)
    
        #list of the means of the windows, disregarding the Nan's
        means = [window[index].mean() for window, index in zip(windows, notNan)]
        # new array
        new_array = np.array(means).reshape(new_shape)
    
        return new_array
    

    I haven't done any time comparisons with your original function, but it should be faster.

    Many solutions I've seen here on SO vectorize the operations to increase speed/efficiency - I don't quite have a handle on that and don't know if it can be applied to your problem. Searching SO for window, array, moving average, vectorize, and numpy should produce similar questions and answers for reference.

    sliding_window() from Efficient Overlapping Windows with Numpy :

    import numpy as np
    from numpy.lib.stride_tricks import as_strided as ast
    from itertools import product
    
    def norm_shape(shape):
        '''
        Normalize numpy array shapes so they're always expressed as a tuple, 
        even for one-dimensional shapes.
    
        Parameters
            shape - an int, or a tuple of ints
    
        Returns
            a shape tuple
        '''
        try:
            i = int(shape)
            return (i,)
        except TypeError:
            # shape was not a number
            pass
    
        try:
            t = tuple(shape)
            return t
        except TypeError:
            # shape was not iterable
            pass
    
        raise TypeError('shape must be an int, or a tuple of ints')
    
    
    def sliding_window(a,ws,ss = None,flatten = True):
        '''
        Return a sliding window over a in any number of dimensions
    
        Parameters:
            a  - an n-dimensional numpy array
            ws - an int (a is 1D) or tuple (a is 2D or greater) representing the size 
                 of each dimension of the window
            ss - an int (a is 1D) or tuple (a is 2D or greater) representing the 
                 amount to slide the window in each dimension. If not specified, it
                 defaults to ws.
            flatten - if True, all slices are flattened, otherwise, there is an 
                      extra dimension for each dimension of the input.
    
        Returns
            an array containing each n-dimensional window from a
        '''
    
        if None is ss:
            # ss was not provided. the windows will not overlap in any direction.
            ss = ws
        ws = norm_shape(ws)
        ss = norm_shape(ss)
    
        # convert ws, ss, and a.shape to numpy arrays so that we can do math in every 
        # dimension at once.
        ws = np.array(ws)
        ss = np.array(ss)
        shape = np.array(a.shape)
    
    
        # ensure that ws, ss, and a.shape all have the same number of dimensions
        ls = [len(shape),len(ws),len(ss)]
        if 1 != len(set(ls)):
            raise ValueError(
            'a.shape, ws and ss must all have the same length. They were %s' % str(ls))
    
        # ensure that ws is smaller than a in every dimension
        if np.any(ws > shape):
            raise ValueError(
            'ws cannot be larger than a in any dimension.
     a.shape was %s and ws was %s' % (str(a.shape),str(ws)))
    
        # how many slices will there be in each dimension?
        newshape = norm_shape(((shape - ws) // ss) + 1)
        # the shape of the strided array will be the number of slices in each dimension
        # plus the shape of the window (tuple addition)
        newshape += norm_shape(ws)
        # the strides tuple will be the array's strides multiplied by step size, plus
        # the array's strides (tuple addition)
        newstrides = norm_shape(np.array(a.strides) * ss) + a.strides
        strided = ast(a,shape = newshape,strides = newstrides)
        if not flatten:
            return strided
    
        # Collapse strided so that it has one more dimension than the window.  I.e.,
        # the new array is a flat list of slices.
        meat = len(ws) if ws.shape else 0
        firstdim = (np.product(newshape[:-meat]),) if ws.shape else ()
        dim = firstdim + (newshape[-meat:])
        # remove any dimensions with size 1
        dim = filter(lambda i : i != 1,dim)
        return strided.reshape(dim)
    

    Interpolate the points, using scipy.interpolate , on a different grid. Below I've shown a cubic interpolator, which is slower but probably more accurate. You'll notice that the corner pixels are missing with this function, you could then use a linear or nearest neighbor interpolation to handle those last values.

    在这里输入图像描述

    import numpy as np
    import pylab as plt
    
    # Test data
    row = np.linspace(-3,3,50)
    X,Y = np.meshgrid(row,row)
    Z = np.sqrt(X**2+Y**2) + np.cos(Y) 
    
    # Make some dead pixels, favor an edge
    dead = np.random.random(Z.shape)
    dead = (dead*X>.7)
    Z[dead] =np.nan
    
    from scipy.interpolate import CloughTocher2DInterpolator as intp
    C = intp((X[~dead],Y[~dead]),Z[~dead])
    
    new_row = np.linspace(-3,3,25)
    xi,yi   = np.meshgrid(new_row,new_row)
    zi = C(xi,yi)
    
    plt.subplot(121)
    plt.title("Original signal 50x50")
    plt.imshow(Z,interpolation='nearest')
    
    plt.subplot(122)
    plt.title("Interpolated signal 25x25")
    plt.imshow(zi,interpolation='nearest')
    
    plt.show()
    
    链接地址: http://www.djcxy.com/p/18908.html

    上一篇: 不可能在堆栈上写入(堆栈溢出)

    下一篇: 调整不包含NaN的二维numpy阵列的大小