Is "==" in sorted array not faster than unsorted array?

This question already has an answer here:

  • Why is it faster to process a sorted array than an unsorted array? 21 answers

  • One thing that immediately comes to mind is CPU's branch prediction algorithm.

    In case of > comparison, in sorted array the branching behavior is very consistent: first, the if condition is consistently false, then it is consistently true. This aligns very well with even the simplest branch prediction.

    In unsorted array the result of > condition is essentially random, thus thwarting any branch prediction.

    This is what makes the sorted version faster.

    In case of == comparison, most of the time the condition is false and only very rarely it is true. This works well with branch prediction regardless of whether the array is sorted or not. The timings are essentially the same.


    NB I'm making this an answer since it's too long for a comment.

    The effect here is exactly what is already explained in great detail in the plentiful answers to this question. Processing a sorted array was faster in that case because of branch prediction.

    Here, the culprit is again branch prediction. The == test is very seldom true, so branch prediction works about the same for both. When you changed it to > , then you got the behavior explained in that question, and for the same reason.


    Moral:

    I believe "processing" a sorted array should be faster than [an ]unsorted array.

    You need to know why. This isn't some magical rule, and it isn't always true.


    The comparison == has less of a relation to ordering than > does. Correctly or incorrectly predicting == would only depend on the number of duplicate values and their distribution.

    You can use perf stat to view the performance counters...

    jason@io /tmp $ lz4 -d ints | perf stat ./proc-eq >/dev/null
    Successfully decoded 104824717 bytes
    
     Performance counter stats for './proc-eq':
    
           5226.932577      task-clock (msec)         #    0.953 CPUs utilized
                    31      context-switches          #    0.006 K/sec
                    24      cpu-migrations            #    0.005 K/sec
                 3,479      page-faults               #    0.666 K/sec
        15,763,486,767      cycles                    #    3.016 GHz
         4,238,973,549      stalled-cycles-frontend   #   26.89% frontend cycles idle
       <not supported>      stalled-cycles-backend
        31,522,072,416      instructions              #    2.00  insns per cycle
                                                      #    0.13  stalled cycles per insn
         8,515,545,178      branches                  # 1629.167 M/sec
            10,261,743      branch-misses             #    0.12% of all branches
    
           5.483071045 seconds time elapsed
    
    jason@io /tmp $ lz4 -d ints | sort -n | perf stat ./proc-eq >/dev/null
    Successfully decoded 104824717 bytes
    
     Performance counter stats for './proc-eq':
    
           5536.031410      task-clock (msec)         #    0.348 CPUs utilized
                   198      context-switches          #    0.036 K/sec
                    21      cpu-migrations            #    0.004 K/sec
                 3,604      page-faults               #    0.651 K/sec
        16,870,541,124      cycles                    #    3.047 GHz
         5,300,218,855      stalled-cycles-frontend   #   31.42% frontend cycles idle
       <not supported>      stalled-cycles-backend
        31,526,006,118      instructions              #    1.87  insns per cycle
                                                      #    0.17  stalled cycles per insn
         8,516,336,829      branches                  # 1538.347 M/sec
            10,980,571      branch-misses             #    0.13% of all branches
    
    jason@io /tmp $ lz4 -d ints | perf stat ./proc-gt >/dev/null
    Successfully decoded 104824717 bytes
    
     Performance counter stats for './proc-gt':
    
           5293.065703      task-clock (msec)         #    0.957 CPUs utilized
                    38      context-switches          #    0.007 K/sec
                    50      cpu-migrations            #    0.009 K/sec
                 3,466      page-faults               #    0.655 K/sec
        15,972,451,322      cycles                    #    3.018 GHz
         4,350,726,606      stalled-cycles-frontend   #   27.24% frontend cycles idle
       <not supported>      stalled-cycles-backend
        31,537,365,299      instructions              #    1.97  insns per cycle
                                                      #    0.14  stalled cycles per insn
         8,515,606,640      branches                  # 1608.823 M/sec
            15,241,198      branch-misses             #    0.18% of all branches
    
           5.532285374 seconds time elapsed
    
    jason@io /tmp $ lz4 -d ints | sort -n | perf stat ./proc-gt >/dev/null
    
          15.930144154 seconds time elapsed
    
     Performance counter stats for './proc-gt':
    
           5203.873321      task-clock (msec)         #    0.339 CPUs utilized
                     7      context-switches          #    0.001 K/sec
                    22      cpu-migrations            #    0.004 K/sec
                 3,459      page-faults               #    0.665 K/sec
        15,830,273,846      cycles                    #    3.042 GHz
         4,456,369,958      stalled-cycles-frontend   #   28.15% frontend cycles idle
       <not supported>      stalled-cycles-backend
        31,540,409,224      instructions              #    1.99  insns per cycle
                                                      #    0.14  stalled cycles per insn
         8,516,186,042      branches                  # 1636.509 M/sec
            10,205,058      branch-misses             #    0.12% of all branches
    
          15.365528326 seconds time elapsed
    
    链接地址: http://www.djcxy.com/p/5370.html

    上一篇: 比较运算符的复杂性

    下一篇: 排序数组中的“==”不比未排序数组快吗?