Is "==" in sorted array not faster than unsorted array?

2018-05-30 20:39:06

This question already has an answer here:

Why is it faster to process a sorted array than an unsorted array? 21 answers

One thing that immediately comes to mind is CPU's branch prediction algorithm.

In case of > comparison, in sorted array the branching behavior is very consistent: first, the if condition is consistently false, then it is consistently true. This aligns very well with even the simplest branch prediction.

In unsorted array the result of > condition is essentially random, thus thwarting any branch prediction.

This is what makes the sorted version faster.

In case of == comparison, most of the time the condition is false and only very rarely it is true. This works well with branch prediction regardless of whether the array is sorted or not. The timings are essentially the same.

NB I'm making this an answer since it's too long for a comment.

The effect here is exactly what is already explained in great detail in the plentiful answers to this question. Processing a sorted array was faster in that case because of branch prediction.

Here, the culprit is again branch prediction. The == test is very seldom true, so branch prediction works about the same for both. When you changed it to > , then you got the behavior explained in that question, and for the same reason.

Moral:

I believe "processing" a sorted array should be faster than [an ]unsorted array.

You need to know why. This isn't some magical rule, and it isn't always true.

The comparison == has less of a relation to ordering than > does. Correctly or incorrectly predicting == would only depend on the number of duplicate values and their distribution.

You can use perf stat to view the performance counters...

jason@io /tmp $ lz4 -d ints | perf stat ./proc-eq >/dev/null
Successfully decoded 104824717 bytes

 Performance counter stats for './proc-eq':

       5226.932577      task-clock (msec)         #    0.953 CPUs utilized
                31      context-switches          #    0.006 K/sec
                24      cpu-migrations            #    0.005 K/sec
             3,479      page-faults               #    0.666 K/sec
    15,763,486,767      cycles                    #    3.016 GHz
     4,238,973,549      stalled-cycles-frontend   #   26.89% frontend cycles idle
   <not supported>      stalled-cycles-backend
    31,522,072,416      instructions              #    2.00  insns per cycle
                                                  #    0.13  stalled cycles per insn
     8,515,545,178      branches                  # 1629.167 M/sec
        10,261,743      branch-misses             #    0.12% of all branches

       5.483071045 seconds time elapsed

jason@io /tmp $ lz4 -d ints | sort -n | perf stat ./proc-eq >/dev/null
Successfully decoded 104824717 bytes

 Performance counter stats for './proc-eq':

       5536.031410      task-clock (msec)         #    0.348 CPUs utilized
               198      context-switches          #    0.036 K/sec
                21      cpu-migrations            #    0.004 K/sec
             3,604      page-faults               #    0.651 K/sec
    16,870,541,124      cycles                    #    3.047 GHz
     5,300,218,855      stalled-cycles-frontend   #   31.42% frontend cycles idle
   <not supported>      stalled-cycles-backend
    31,526,006,118      instructions              #    1.87  insns per cycle
                                                  #    0.17  stalled cycles per insn
     8,516,336,829      branches                  # 1538.347 M/sec
        10,980,571      branch-misses             #    0.13% of all branches

jason@io /tmp $ lz4 -d ints | perf stat ./proc-gt >/dev/null
Successfully decoded 104824717 bytes

 Performance counter stats for './proc-gt':

       5293.065703      task-clock (msec)         #    0.957 CPUs utilized
                38      context-switches          #    0.007 K/sec
                50      cpu-migrations            #    0.009 K/sec
             3,466      page-faults               #    0.655 K/sec
    15,972,451,322      cycles                    #    3.018 GHz
     4,350,726,606      stalled-cycles-frontend   #   27.24% frontend cycles idle
   <not supported>      stalled-cycles-backend
    31,537,365,299      instructions              #    1.97  insns per cycle
                                                  #    0.14  stalled cycles per insn
     8,515,606,640      branches                  # 1608.823 M/sec
        15,241,198      branch-misses             #    0.18% of all branches

       5.532285374 seconds time elapsed

jason@io /tmp $ lz4 -d ints | sort -n | perf stat ./proc-gt >/dev/null

      15.930144154 seconds time elapsed

 Performance counter stats for './proc-gt':

       5203.873321      task-clock (msec)         #    0.339 CPUs utilized
                 7      context-switches          #    0.001 K/sec
                22      cpu-migrations            #    0.004 K/sec
             3,459      page-faults               #    0.665 K/sec
    15,830,273,846      cycles                    #    3.042 GHz
     4,456,369,958      stalled-cycles-frontend   #   28.15% frontend cycles idle
   <not supported>      stalled-cycles-backend
    31,540,409,224      instructions              #    1.99  insns per cycle
                                                  #    0.14  stalled cycles per insn
     8,516,186,042      branches                  # 1636.509 M/sec
        10,205,058      branch-misses             #    0.12% of all branches

      15.365528326 seconds time elapsed

链接地址: http://www.djcxy.com/p/5370.html

上一篇: 比较运算符的复杂性

下一篇: 排序数组中的“==”不比未排序数组快吗？