Is "==" in sorted array not faster than unsorted array?
This question already has an answer here:
One thing that immediately comes to mind is CPU's branch prediction algorithm.
In case of >
comparison, in sorted array the branching behavior is very consistent: first, the if
condition is consistently false, then it is consistently true. This aligns very well with even the simplest branch prediction.
In unsorted array the result of >
condition is essentially random, thus thwarting any branch prediction.
This is what makes the sorted version faster.
In case of ==
comparison, most of the time the condition is false and only very rarely it is true. This works well with branch prediction regardless of whether the array is sorted or not. The timings are essentially the same.
NB I'm making this an answer since it's too long for a comment.
The effect here is exactly what is already explained in great detail in the plentiful answers to this question. Processing a sorted array was faster in that case because of branch prediction.
Here, the culprit is again branch prediction. The ==
test is very seldom true, so branch prediction works about the same for both. When you changed it to >
, then you got the behavior explained in that question, and for the same reason.
Moral:
I believe "processing" a sorted array should be faster than [an ]unsorted array.
You need to know why. This isn't some magical rule, and it isn't always true.
The comparison ==
has less of a relation to ordering than >
does. Correctly or incorrectly predicting ==
would only depend on the number of duplicate values and their distribution.
You can use perf stat
to view the performance counters...
jason@io /tmp $ lz4 -d ints | perf stat ./proc-eq >/dev/null
Successfully decoded 104824717 bytes
Performance counter stats for './proc-eq':
5226.932577 task-clock (msec) # 0.953 CPUs utilized
31 context-switches # 0.006 K/sec
24 cpu-migrations # 0.005 K/sec
3,479 page-faults # 0.666 K/sec
15,763,486,767 cycles # 3.016 GHz
4,238,973,549 stalled-cycles-frontend # 26.89% frontend cycles idle
<not supported> stalled-cycles-backend
31,522,072,416 instructions # 2.00 insns per cycle
# 0.13 stalled cycles per insn
8,515,545,178 branches # 1629.167 M/sec
10,261,743 branch-misses # 0.12% of all branches
5.483071045 seconds time elapsed
jason@io /tmp $ lz4 -d ints | sort -n | perf stat ./proc-eq >/dev/null
Successfully decoded 104824717 bytes
Performance counter stats for './proc-eq':
5536.031410 task-clock (msec) # 0.348 CPUs utilized
198 context-switches # 0.036 K/sec
21 cpu-migrations # 0.004 K/sec
3,604 page-faults # 0.651 K/sec
16,870,541,124 cycles # 3.047 GHz
5,300,218,855 stalled-cycles-frontend # 31.42% frontend cycles idle
<not supported> stalled-cycles-backend
31,526,006,118 instructions # 1.87 insns per cycle
# 0.17 stalled cycles per insn
8,516,336,829 branches # 1538.347 M/sec
10,980,571 branch-misses # 0.13% of all branches
jason@io /tmp $ lz4 -d ints | perf stat ./proc-gt >/dev/null
Successfully decoded 104824717 bytes
Performance counter stats for './proc-gt':
5293.065703 task-clock (msec) # 0.957 CPUs utilized
38 context-switches # 0.007 K/sec
50 cpu-migrations # 0.009 K/sec
3,466 page-faults # 0.655 K/sec
15,972,451,322 cycles # 3.018 GHz
4,350,726,606 stalled-cycles-frontend # 27.24% frontend cycles idle
<not supported> stalled-cycles-backend
31,537,365,299 instructions # 1.97 insns per cycle
# 0.14 stalled cycles per insn
8,515,606,640 branches # 1608.823 M/sec
15,241,198 branch-misses # 0.18% of all branches
5.532285374 seconds time elapsed
jason@io /tmp $ lz4 -d ints | sort -n | perf stat ./proc-gt >/dev/null
15.930144154 seconds time elapsed
Performance counter stats for './proc-gt':
5203.873321 task-clock (msec) # 0.339 CPUs utilized
7 context-switches # 0.001 K/sec
22 cpu-migrations # 0.004 K/sec
3,459 page-faults # 0.665 K/sec
15,830,273,846 cycles # 3.042 GHz
4,456,369,958 stalled-cycles-frontend # 28.15% frontend cycles idle
<not supported> stalled-cycles-backend
31,540,409,224 instructions # 1.99 insns per cycle
# 0.14 stalled cycles per insn
8,516,186,042 branches # 1636.509 M/sec
10,205,058 branch-misses # 0.12% of all branches
15.365528326 seconds time elapsed
链接地址: http://www.djcxy.com/p/5370.html
上一篇: 比较运算符的复杂性
下一篇: 排序数组中的“==”不比未排序数组快吗?