快速搜索排序向量中大于x的最小值

2018-06-27 05:03:27

快速意味着比O（N）更好，这与find（）能够达到的一样好。我知道ismembc和ismembc2 ，但我不认为他们都是我正在寻找的。我阅读文档，似乎他们搜索一个等于x的成员，但我希望第一个值的索引大于x。

现在，如果这些功能中的任何一个都能够做到这一点，有人可以举个例子，因为我无法弄清楚。

理想的行为：

first_greater_than([0, 3, 3, 4, 7], 1)

返回2，第一个值的索引大于1，但显然输入数组将会大得多。

当然，二分查找并不是很难实现，但如果MATLAB已经完成了，我宁愿使用他们的方法。

由于输入已经排序，所以自定义的二进制搜索应该可以工作（您可能需要对边缘情况进行一些更新，即请求的值小于数组的所有元素）：

function [result, res2] = binarySearchExample(val) 

    %// Generate example data and sort it
    N = 100000000;
    a = rand(N, 1);
    a = sort(a);

    %// Run the algorithm
    tic % start timing of the binary search algorithm
    div = 1;
    idx = floor(N/div);
    while(1)
        div = div * 2;

        %// Check if less than val check if the next is greater
        if a(idx) <= val,
            if a(idx + 1) > val,
                result = a(idx + 1);
                break
            else %// Get bigger 
                idx = idx + max([floor(N / div), 1]);
            end
        end
        if a(idx) > val, % get smaller
            idx = idx - floor(N / div);
        end
    end % end the while loop
    toc % end timing of the binary search algorithm

    %% ------------------------
    %% compare to MATLAB find
    tic % start timing of a matlab find
    j = find(a > val, 1);
    res2 = a(j);
    toc % end timing of a matlab find

%// Benchmark
>> [res1, res2] = binarySearchExample(0.556)

Elapsed time is 0.000093 seconds.
Elapsed time is 0.327183 seconds.

res1 =
   0.5560

res2 =
   0.5560

这是我的实现。这不是我正在寻找的答案，但现在，我将不得不假设我以后没有在MATLAB中实现。

关于指数的说明

所有的MATLAB索引都做错了，因为它们从1开始，而不是从0开始。尽管如此，我仍然从0开始索引。因此，在整个过程中，您将看到如下所示的索引： array(1+i)访问元素i，其中i在[0，N）中。此外，所有的MATLAB范围都做错了。他们的惯例是[a，b]，而不是[a，b]。所以你会看到整个范围看起来像这样：0：N-1是从0到N的数字范围（通常是N维数组的索引）。当数组使用范围索引时，必须进行两次校正与此同时。 1被添加到顶部和底部边界，1从顶部被减去。这是结果：array（1 + a：b）访问[a，b）中的元素，其中a和b位于[0，N）且b> a。我应该真的只是使用python和scipy，但现在已经太迟了。下一个项目...

binary_search.m：在我看来，它比@ ljk07的实现更加整洁，但当然它们仍然得到接受。谢谢，@ ljk07。

function i = binary_search(v, x)
%binary_search finds the first element in v greater than x
% v is a vector and x is a double. Returns the index of the desired element
% as an int64 or -1 if it doesn't exist.

% We'll call the first element of v greater than x v_f.

% Is v_f the zeroth element? This is technically covered by the algorithm,
% but is such a common case that it should be addressed immediately. It
% would otherwise take the same amount of time as the rest of them. This
% will add a check to each of the others, though, so it's a toss-up to an
% extent.
if v(1+0) > x
    i = 0;
    return;
end

% MATLAB foolishly returns the number of elements as a floating point
% constant. Thank you very much, MATLAB.
b = int64(numel(v));

% If v_f doesn't exist, return -1. This is also needed to ensure the
% algorithm later on terminates, which makes sense.
if v(1+b-1) <= x
    i = -1;
    return;
end

a = int64(0);

% There is now guaranteed to be more than one element, since if there
% wasn't, one of the above would have matched. So we split the [a, b) range
% at the top of the loop.

% The number of elements in the interval. Calculated once per loop. It is
% recalculated at the bottom of the loop, so it needs to be calculated just
% once before the loop can begin.
n = b;
while true
    % MATLAB's / operator foolishly rounds to nearest instead of flooring
    % when both inputs are integers. Thank you very much, MATLAB.
    p = a + idivide(n, int64(2));

    % Is v_f in [a, p) or [p, b)?
    if v(1+p-1) > x
        % v_f is in [a, p).
        b = p;
    else
        % v_f is in [p, b).
        a = p;
    end

    n = b - a;
    if n == 1
        i = a;
        return;
    end
end
end

binary_search_test.m：

% Some simple tests. These had better pass...
assert(binary_search([0], 0) == -1);
assert(binary_search([0], -1) == 0);

assert(binary_search([0 1], 0.5) == 1);
assert(binary_search([0 1 1], 0.5) == 1);
assert(binary_search([0 1 2], 0.5) == 1);
assert(binary_search([0 1 2], 1.5) == 2);

% Compare the algorithm to internal find.
for n = [1 1:8]
    n
    v = sort(rand(10^n, 1));
    x = 0.5;
    %%
    tic;
    ifind = find(v > x, 1,'first') - 1;
    toc;
    % repeat. The second time is faster usually. Some kind of JIT
    % optimisation...
    tic;
    ifind = find(v > x, 1,'first') - 1;
    toc;
    tic;
    ibs = binary_search(v, x);
    toc;
    tic;
    ibs = binary_search(v, x);
    toc;
    assert(ifind == ibs);
end

binary_search_test.m的输出（在我的电脑上）：

n =

     1

Elapsed time is 0.000054 seconds.
Elapsed time is 0.000021 seconds.
Elapsed time is 0.001273 seconds.
Elapsed time is 0.001135 seconds.

n =

     2

Elapsed time is 0.000050 seconds.
Elapsed time is 0.000018 seconds.
Elapsed time is 0.001571 seconds.
Elapsed time is 0.001494 seconds.

n =

     3

Elapsed time is 0.000034 seconds.
Elapsed time is 0.000025 seconds.
Elapsed time is 0.002344 seconds.
Elapsed time is 0.002193 seconds.

n =

     4

Elapsed time is 0.000057 seconds.
Elapsed time is 0.000044 seconds.
Elapsed time is 0.003131 seconds.
Elapsed time is 0.003031 seconds.

n =

     5

Elapsed time is 0.000473 seconds.
Elapsed time is 0.000333 seconds.
Elapsed time is 0.003620 seconds.
Elapsed time is 0.003161 seconds.

n =

     6

Elapsed time is 0.003984 seconds.
Elapsed time is 0.003635 seconds.
Elapsed time is 0.004209 seconds.
Elapsed time is 0.003825 seconds.

n =

     7

Elapsed time is 0.034811 seconds.
Elapsed time is 0.039106 seconds.
Elapsed time is 0.005089 seconds.
Elapsed time is 0.004867 seconds.

n =

     8

Elapsed time is 0.322853 seconds.
Elapsed time is 0.323777 seconds.
Elapsed time is 0.005969 seconds.
Elapsed time is 0.005487 seconds.

肯定有加速。在我的电脑上，你可以看到加速度达到了一百万元左右。因此，除非在C中实现binary_search，或者您有一个包含大约百万个元素的向量，否则即使使用了一个愚蠢的算法，查找仍然更快。我预计门槛将低于这个水平。我的猜测是，因为查找大部分是在C中内部实现的。不公平:(但是，对于我的特定应用程序，我的矢量大小只有大约一千个，所以毕竟，对我来说真的是更快。至少直到那天我用mex文件在C语言中实现binary_search，或者切换到scipy，无论哪个先发生，我都厌倦了MATLAB的小小的不方便的切换，你可以通过阅读我的代码中的注释来判断。

链接地址: http://www.djcxy.com/p/76073.html

上一篇: Fast searching for the lowest value greater than x in a sorted vector

下一篇: Paypal preapproval payments