排序数组中的“==”不比未排序数组快吗？

2018-05-30 20:39:05

这个问题在这里已经有了答案：

为什么处理排序后的数组比未排序的数组更快？ 21个答案

立即想到的一件事是CPU的分支预测算法。

在以下情况下>比较，排序数组的分支行为是非常一致的：第一， if条件是一致的假的，那么它是一贯正确的。即使是最简单的分支预测也能很好地匹配。

在未排序的数组中， >条件的结果本质上是随机的，因此阻碍了任何分支预测。

这是使排序版本更快的原因。

在==比较的情况下，大多数情况下这种情况是错误的，而且很少情况是真的。无论数组是否排序，这都可以在分支预测中很好地工作。时间基本相同。

NB我正在做出这个答案，因为评论太长。

这个效果正是在这个问题的丰富答案中已经详细解释过的。由于分支预测，处理排序数组的速度更快。

在这里，罪魁祸首再次是分支预测。 ==测试非常少见，因此分支预测对两者都有同样的作用。当你将它改为> ，你就会得到在该问题中解释的行为，并且出于同样的原因。

道德：

我相信“处理”一个排序的数组应该比未排序的数组快。

你需要知道为什么。这不是一个神奇的规则，它并不总是如此。

比较==与排序的关系不如> 。正确或不正确地预测==只取决于重复值的数量及其分布。

您可以使用perf stat来查看性能计数器...

jason@io /tmp $ lz4 -d ints | perf stat ./proc-eq >/dev/null
Successfully decoded 104824717 bytes

 Performance counter stats for './proc-eq':

       5226.932577      task-clock (msec)         #    0.953 CPUs utilized
                31      context-switches          #    0.006 K/sec
                24      cpu-migrations            #    0.005 K/sec
             3,479      page-faults               #    0.666 K/sec
    15,763,486,767      cycles                    #    3.016 GHz
     4,238,973,549      stalled-cycles-frontend   #   26.89% frontend cycles idle
   <not supported>      stalled-cycles-backend
    31,522,072,416      instructions              #    2.00  insns per cycle
                                                  #    0.13  stalled cycles per insn
     8,515,545,178      branches                  # 1629.167 M/sec
        10,261,743      branch-misses             #    0.12% of all branches

       5.483071045 seconds time elapsed

jason@io /tmp $ lz4 -d ints | sort -n | perf stat ./proc-eq >/dev/null
Successfully decoded 104824717 bytes

 Performance counter stats for './proc-eq':

       5536.031410      task-clock (msec)         #    0.348 CPUs utilized
               198      context-switches          #    0.036 K/sec
                21      cpu-migrations            #    0.004 K/sec
             3,604      page-faults               #    0.651 K/sec
    16,870,541,124      cycles                    #    3.047 GHz
     5,300,218,855      stalled-cycles-frontend   #   31.42% frontend cycles idle
   <not supported>      stalled-cycles-backend
    31,526,006,118      instructions              #    1.87  insns per cycle
                                                  #    0.17  stalled cycles per insn
     8,516,336,829      branches                  # 1538.347 M/sec
        10,980,571      branch-misses             #    0.13% of all branches

jason@io /tmp $ lz4 -d ints | perf stat ./proc-gt >/dev/null
Successfully decoded 104824717 bytes

 Performance counter stats for './proc-gt':

       5293.065703      task-clock (msec)         #    0.957 CPUs utilized
                38      context-switches          #    0.007 K/sec
                50      cpu-migrations            #    0.009 K/sec
             3,466      page-faults               #    0.655 K/sec
    15,972,451,322      cycles                    #    3.018 GHz
     4,350,726,606      stalled-cycles-frontend   #   27.24% frontend cycles idle
   <not supported>      stalled-cycles-backend
    31,537,365,299      instructions              #    1.97  insns per cycle
                                                  #    0.14  stalled cycles per insn
     8,515,606,640      branches                  # 1608.823 M/sec
        15,241,198      branch-misses             #    0.18% of all branches

       5.532285374 seconds time elapsed

jason@io /tmp $ lz4 -d ints | sort -n | perf stat ./proc-gt >/dev/null

      15.930144154 seconds time elapsed

 Performance counter stats for './proc-gt':

       5203.873321      task-clock (msec)         #    0.339 CPUs utilized
                 7      context-switches          #    0.001 K/sec
                22      cpu-migrations            #    0.004 K/sec
             3,459      page-faults               #    0.665 K/sec
    15,830,273,846      cycles                    #    3.042 GHz
     4,456,369,958      stalled-cycles-frontend   #   28.15% frontend cycles idle
   <not supported>      stalled-cycles-backend
    31,540,409,224      instructions              #    1.99  insns per cycle
                                                  #    0.14  stalled cycles per insn
     8,516,186,042      branches                  # 1636.509 M/sec
        10,205,058      branch-misses             #    0.12% of all branches

      15.365528326 seconds time elapsed

链接地址: http://www.djcxy.com/p/5369.html

上一篇: Is "==" in sorted array not faster than unsorted array?

下一篇: Can't use numpy.sign(), but the book can used, I don't know why