Bitwise operators, not vs xor use in branching

After asking this SO question, I received a very interesting comment from @AndonM.Coleman that I had to verify.

Since your disassembled code is written for x86, it is worth pointing out that XOR will set/clear the Zero Flag whereas NOT will not (sometimes useful if you want to perform a bitwise operation without affecting jump conditions that rely on flags from previous operations). Now, considering you're not writing assembly directly, you really have no access to this flag in a meaningful way so I doubt this is the reason for favoring one over the other.

His comment got me curious if the following code would produce the same assembly instructions

#include <iostream>

int main()
{
    unsigned int val = 0;

    std::cout << "Enter a numeric value: ";
    std::cin >> val;

    if ( (val ^ ~0U) == 0)
    {
        std::cout << "Value inverted is zero" << std::endl;
    } else
    {
        std::cout << "Value inverted is not zero" << std::endl;
    }

    if ( (~val) == 0)
    {
        std::cout << "Value inverted is zero" << std::endl;
    } else
    {
        std::cout << "Value inverted is not zero" << std::endl;
    }

    return 0;
}

For the following two operations

if ( (val ^ ~0U) == 0 )

and

if ( (~val) == 0 )

The not optimized build in Visual Studio 2010 gives the following disassembly:

    if ( (val ^ ~0U) == 0)
00AD1501  mov         eax,dword ptr [val]  
00AD1504  xor         eax,0FFFFFFFFh  
00AD1507  jne         main+86h (0AD1536h)  


    if ( (~val) == 0)
00AD1561  mov         eax,dword ptr [val]  
00AD1564  not         eax  
00AD1566  test        eax,eax  
00AD1568  jne         main+0E7h (0AD1597h)  

My question regards optimisation. Is it better to write

if ( (val ^ ~0U) == 0)

or

if ( (~val) == 0)

This depends on a lot of things, but mostly what (if anything) you tell the compiler to optimize for.

If the compiler is set to optimize for size (smallest bytecode), then sometimes it will use XOR in seemingly strange places. For instance, the variable length encoding scheme X86 uses can set a register to 0 by XOR 'ing itself in fewer bytes of code than would be required using the MOV instruction.

Consider the code that uses XOR :

if ( (val ^ ~0U) == 0 )  /* 3-bytes to negate and test (x86) */

XOR eax,0FFFFFFFFh requires 3-bytes AND sets/clears the Zero Flag (ZF)

Now, consider the code that uses NOT :

if ( (~val) == 0)        /* 4-bytes to negate and test (x86) */

NOT eax is encoded into a 2-byte instruction, but does not affect CPU flags.

TEST eax,eax adds an additional 2-bytes, and is necessary to set/clear the Zero Flag (ZF)

NOT is also a simple instruction, but since it does not affect any CPU flags, you must issue a TEST instruction afterwards to use it for branching as seen in your code. This actually produces larger bytecode, so a smart compiler set to optimize for size would probably try to avoid using NOT . How many cycles both of these instructions together take to complete varies between CPU generation, and a smart compiler would also factor this into its decision making when told to optimize for speed.


If you are not writing hand-tuned assembly, it is best to use whatever is clearest to a human and hope that the compiler is smart enough to choose different instructions/scheduling/etc. to optimize for size/speed as requested at compile-time. Compilers have a smart set of heuristics they use to choose and schedule instructions, they know more about the target CPU architecture than the average coder.

If you find out later that this branch really is a bottleneck and there is no higher-level way around the problem, then you could do some low-level tuning. However, this is such a trivial thing to focus on these days unless you are targeting something like a low-power embedded CPU or memory limited device. The only places I have ever squeezed out enough performance by hand-tuning to make it worthwhile were in algorithms that benefited from data parallelism and where the compiler was not smart enough to effectively utilize specialized instruction sets like MMX/SSE.

链接地址: http://www.djcxy.com/p/58244.html

上一篇: 全8位加法器,不合逻辑的输出

下一篇: 按位运算符,而不是xor用于分支