Fast searching for the lowest value greater than x in a sorted vector
Fast means better than O(N), which is as good as find() is capable of. I know there is ismembc
and ismembc2
, but I don't think either of them are what I am looking for. I read the documentation and it seems they search for a member equal to x, but I want the index of first value greater than x.
Now if either of these functions is capable of doing this, could somebody please give an example, because I can't figure it out.
Ideal behaviour:
first_greater_than([0, 3, 3, 4, 7], 1)
returns 2, the index of the first value greater than 1, though obviously the input array would be vastly larger.
Of course, a binary search isn't too difficult to implement, but if MATLAB has already done it, I would rather use their method.
由于输入已经排序,所以自定义的二进制搜索应该可以工作(您可能需要对边缘情况进行一些更新,即请求的值小于数组的所有元素):
function [result, res2] = binarySearchExample(val)
%// Generate example data and sort it
N = 100000000;
a = rand(N, 1);
a = sort(a);
%// Run the algorithm
tic % start timing of the binary search algorithm
div = 1;
idx = floor(N/div);
while(1)
div = div * 2;
%// Check if less than val check if the next is greater
if a(idx) <= val,
if a(idx + 1) > val,
result = a(idx + 1);
break
else %// Get bigger
idx = idx + max([floor(N / div), 1]);
end
end
if a(idx) > val, % get smaller
idx = idx - floor(N / div);
end
end % end the while loop
toc % end timing of the binary search algorithm
%% ------------------------
%% compare to MATLAB find
tic % start timing of a matlab find
j = find(a > val, 1);
res2 = a(j);
toc % end timing of a matlab find
%// Benchmark
>> [res1, res2] = binarySearchExample(0.556)
Elapsed time is 0.000093 seconds.
Elapsed time is 0.327183 seconds.
res1 =
0.5560
res2 =
0.5560
Here's my implementation. This is not quite the answer I was looking for, but for now, I will have to assume what I am after is not implemented in MATLAB.
A Note on Indices
All MATLAB indices are done wrong, in that they start from 1 and not 0. I still index from 0, though, in spite of this. So throughout, you will see indexing that looks like this: array(1+i)
accesses element i, where i is in [0, N). Also, all MATLAB ranges are done wrong. Their convention is [a, b], instead of [a, b). So you will see ranges that look like this throughout: 0:N-1 is the range of numbers (often indices of an N-dimensional array) from 0 to N. When arrays are indexed with a range, both corrections have to be made at the same time. 1 is added to the top and bottom boundary and 1 is subtracted from the top. This is the result: array(1+a:b) accesses elements in [a, b), where a and b are in [0, N) and b > a. I should really just be using python and scipy instead, but it's too late for that. Next project...
binary_search.m: It is much tidier than @ljk07's implementation in my opinion, but they still get an accept, of course. Thanks, @ljk07.
function i = binary_search(v, x)
%binary_search finds the first element in v greater than x
% v is a vector and x is a double. Returns the index of the desired element
% as an int64 or -1 if it doesn't exist.
% We'll call the first element of v greater than x v_f.
% Is v_f the zeroth element? This is technically covered by the algorithm,
% but is such a common case that it should be addressed immediately. It
% would otherwise take the same amount of time as the rest of them. This
% will add a check to each of the others, though, so it's a toss-up to an
% extent.
if v(1+0) > x
i = 0;
return;
end
% MATLAB foolishly returns the number of elements as a floating point
% constant. Thank you very much, MATLAB.
b = int64(numel(v));
% If v_f doesn't exist, return -1. This is also needed to ensure the
% algorithm later on terminates, which makes sense.
if v(1+b-1) <= x
i = -1;
return;
end
a = int64(0);
% There is now guaranteed to be more than one element, since if there
% wasn't, one of the above would have matched. So we split the [a, b) range
% at the top of the loop.
% The number of elements in the interval. Calculated once per loop. It is
% recalculated at the bottom of the loop, so it needs to be calculated just
% once before the loop can begin.
n = b;
while true
% MATLAB's / operator foolishly rounds to nearest instead of flooring
% when both inputs are integers. Thank you very much, MATLAB.
p = a + idivide(n, int64(2));
% Is v_f in [a, p) or [p, b)?
if v(1+p-1) > x
% v_f is in [a, p).
b = p;
else
% v_f is in [p, b).
a = p;
end
n = b - a;
if n == 1
i = a;
return;
end
end
end
binary_search_test.m:
% Some simple tests. These had better pass...
assert(binary_search([0], 0) == -1);
assert(binary_search([0], -1) == 0);
assert(binary_search([0 1], 0.5) == 1);
assert(binary_search([0 1 1], 0.5) == 1);
assert(binary_search([0 1 2], 0.5) == 1);
assert(binary_search([0 1 2], 1.5) == 2);
% Compare the algorithm to internal find.
for n = [1 1:8]
n
v = sort(rand(10^n, 1));
x = 0.5;
%%
tic;
ifind = find(v > x, 1,'first') - 1;
toc;
% repeat. The second time is faster usually. Some kind of JIT
% optimisation...
tic;
ifind = find(v > x, 1,'first') - 1;
toc;
tic;
ibs = binary_search(v, x);
toc;
tic;
ibs = binary_search(v, x);
toc;
assert(ifind == ibs);
end
Output of binary_search_test.m (on my computer):
n =
1
Elapsed time is 0.000054 seconds.
Elapsed time is 0.000021 seconds.
Elapsed time is 0.001273 seconds.
Elapsed time is 0.001135 seconds.
n =
2
Elapsed time is 0.000050 seconds.
Elapsed time is 0.000018 seconds.
Elapsed time is 0.001571 seconds.
Elapsed time is 0.001494 seconds.
n =
3
Elapsed time is 0.000034 seconds.
Elapsed time is 0.000025 seconds.
Elapsed time is 0.002344 seconds.
Elapsed time is 0.002193 seconds.
n =
4
Elapsed time is 0.000057 seconds.
Elapsed time is 0.000044 seconds.
Elapsed time is 0.003131 seconds.
Elapsed time is 0.003031 seconds.
n =
5
Elapsed time is 0.000473 seconds.
Elapsed time is 0.000333 seconds.
Elapsed time is 0.003620 seconds.
Elapsed time is 0.003161 seconds.
n =
6
Elapsed time is 0.003984 seconds.
Elapsed time is 0.003635 seconds.
Elapsed time is 0.004209 seconds.
Elapsed time is 0.003825 seconds.
n =
7
Elapsed time is 0.034811 seconds.
Elapsed time is 0.039106 seconds.
Elapsed time is 0.005089 seconds.
Elapsed time is 0.004867 seconds.
n =
8
Elapsed time is 0.322853 seconds.
Elapsed time is 0.323777 seconds.
Elapsed time is 0.005969 seconds.
Elapsed time is 0.005487 seconds.
There is definitely a speedup. On my computer you can see the speedup is reached in around the one million element mark. So unless binary_search is implemented in C or you have a vector with about million elements, find is still faster, even though it is using a stupid algorithm. I was expecting the threshold to be waaay lower than that. My guess is because find is mostly internally implemented in C. Not fair :( But nevertheless, for my particular application, I have vector sizes of only about a thousand, so after all that, find really is faster for me. At least until the day I implement binary_search in C with a mex file or switch to scipy, whichever happens first. I'm kind of getting tired of MATLAB's little inconvenient switchups. You can tell by reading the comments in my code.
链接地址: http://www.djcxy.com/p/76074.html上一篇: Doxygen枚举没有显示
下一篇: 快速搜索排序向量中大于x的最小值