Objective中的非规格化浮点

Stack Overflow的相关性问题/答案为什么将0.1f更改为0会使性能下降10倍? 为Objective-C? 如果有任何相关性,这应如何改变我的编码习惯? 有什么方法可以关闭Mac OS X上的非规格化浮点?

这似乎与iOS完全无关。 那是对的吗?


正如我在回答你的评论时说的那样:

它不仅仅是一个CPU而是一个语言问题,所以它可能与x86上的Objective-C有关。 (iPhone的ARMv7似乎不支持非规范化浮点数,至少使用默认的运行时/构建设置)

更新

我刚测试过。 在x86上的Mac OS X上观察到速度下降,在ARMv7上的iOS上它不是(默认构建设置)。

而且可以预料,在iOS模拟器(在x86上)运行的非规范化浮点数会再次出现。

有趣的是, FLT_MINDBL_MIN分别被定义为最小的非非规格化数字(在iOS,Mac OS X和Linux上)。 奇怪的事情发生使用

DBL_MIN/2.0

在你的代码中; 编译器很高兴地设置一个非规格化的常量,但只要(arm)CPU触及它,它就被设置为零:

double test = DBL_MIN/2.0;
printf("test      == 0.0 %dn",test==0.0);
printf("DBL_MIN/2 == 0.0 %dn",DBL_MIN/2.0==0.0);

输出:

test      == 0.0 1  // computer says YES
DBL_MIN/2 == 0.0 0  // compiler says NO

因此,快速运行时检查是否支持非规范化可以是:

#define SUPPORT_DENORMALIZATION ({volatile double t=DBL_MIN/2.0;t!=0.0;})

(“即使没有适用于任何目的的暗示保证”也是如此)

这就是ARM在刷新到零模式时所说的:http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0204h/Bcfheche.html

更新<< 1

这是您在ARMv7上禁用刷新到零模式的方式:

int x;
asm(
    "vmrs %[result],FPSCR rn"
    "bic %[result],%[result],#16777216 rn"
    "vmsr FPSCR,%[result]"
    :[result] "=r" (x) : :
);
printf("ARM FPSCR: %08xn",x);

以下惊人的结果。

  • 第1列:浮动,每次迭代除以2
  • 第2列:这个浮点数的二进制表示
  • 第3栏:总计这个浮动1e7次所花费的时间
  • 您可以清楚地看到反规范化的成本为零。 (对于iPad 2.在iPhone 4上,它的成本只有10%的小幅下降。)

    0.000000000000000000000000000000000100000004670110: 10111100001101110010000011100000 110 ms
    0.000000000000000000000000000000000050000002335055: 10111100001101110010000101100000 110 ms
    0.000000000000000000000000000000000025000001167528: 10111100001101110010000001100000 110 ms
    0.000000000000000000000000000000000012500000583764: 10111100001101110010000110100000 110 ms
    0.000000000000000000000000000000000006250000291882: 10111100001101110010000010100000 111 ms
    0.000000000000000000000000000000000003125000145941: 10111100001101110010000100100000 110 ms
    0.000000000000000000000000000000000001562500072970: 10111100001101110010000000100000 110 ms
    0.000000000000000000000000000000000000781250036485: 10111100001101110010000111000000 110 ms
    0.000000000000000000000000000000000000390625018243: 10111100001101110010000011000000 110 ms
    0.000000000000000000000000000000000000195312509121: 10111100001101110010000101000000 110 ms
    0.000000000000000000000000000000000000097656254561: 10111100001101110010000001000000 110 ms
    0.000000000000000000000000000000000000048828127280: 10111100001101110010000110000000 110 ms
    0.000000000000000000000000000000000000024414063640: 10111100001101110010000010000000 110 ms
    0.000000000000000000000000000000000000012207031820: 10111100001101110010000100000000 111 ms
    0.000000000000000000000000000000000000006103515209: 01111000011011100100001000000000 110 ms
    0.000000000000000000000000000000000000003051757605: 11110000110111001000010000000000 110 ms
    0.000000000000000000000000000000000000001525879503: 00010001101110010000100000000000 110 ms
    0.000000000000000000000000000000000000000762939751: 00100011011100100001000000000000 110 ms
    0.000000000000000000000000000000000000000381469876: 01000110111001000010000000000000 112 ms
    0.000000000000000000000000000000000000000190734938: 10001101110010000100000000000000 110 ms
    0.000000000000000000000000000000000000000095366768: 00011011100100001000000000000000 110 ms
    0.000000000000000000000000000000000000000047683384: 00110111001000010000000000000000 110 ms
    0.000000000000000000000000000000000000000023841692: 01101110010000100000000000000000 111 ms
    0.000000000000000000000000000000000000000011920846: 11011100100001000000000000000000 110 ms
    0.000000000000000000000000000000000000000005961124: 01111001000010000000000000000000 110 ms
    0.000000000000000000000000000000000000000002980562: 11110010000100000000000000000000 110 ms
    0.000000000000000000000000000000000000000001490982: 00010100001000000000000000000000 110 ms
    0.000000000000000000000000000000000000000000745491: 00101000010000000000000000000000 110 ms
    0.000000000000000000000000000000000000000000372745: 01010000100000000000000000000000 110 ms
    0.000000000000000000000000000000000000000000186373: 10100001000000000000000000000000 110 ms
    0.000000000000000000000000000000000000000000092486: 01000010000000000000000000000000 110 ms
    0.000000000000000000000000000000000000000000046243: 10000100000000000000000000000000 111 ms
    0.000000000000000000000000000000000000000000022421: 00001000000000000000000000000000 110 ms
    0.000000000000000000000000000000000000000000011210: 00010000000000000000000000000000 110 ms
    0.000000000000000000000000000000000000000000005605: 00100000000000000000000000000000 111 ms
    0.000000000000000000000000000000000000000000002803: 01000000000000000000000000000000 110 ms
    0.000000000000000000000000000000000000000000001401: 10000000000000000000000000000000 110 ms
    0.000000000000000000000000000000000000000000000000: 00000000000000000000000000000000 110 ms
    0.000000000000000000000000000000000000000000000000: 00000000000000000000000000000000 110 ms
    0.000000000000000000000000000000000000000000000000: 00000000000000000000000000000000 110 ms
    
    链接地址: http://www.djcxy.com/p/14985.html

    上一篇: Denormalized floating point in Objective

    下一篇: What is a good CPU/PC setup to speed up intensive C++/templates compilation?