Does the C++ volatile keyword introduce a memory fence?

I understand that volatile informs the compiler that the value may be changed, but in order to accomplish this functionality, does the compiler need to introduce a memory fence to make it work?

From my understanding, the sequence of operations on volatile objects cannot be reordered and must be preserved. This seems to imply some memory fences are necessary and that there isn't really a way around this. Am I correct in saying this?


There is an interesting discussion at this related question

Jonathan Wakely writes:

... Accesses to distinct volatile variables cannot be reordered by the compiler as long as they occur in separate full expressions ... right that volatile is useless for thread-safety, but not for the reasons he gives. It's not because the compiler might reorder accesses to volatile objects, but because the CPU might reorder them. Atomic operations and memory barriers prevent the compiler and the CPU from reordering

To which David Schwartz replies in the comments:

... There's no difference, from the point of view of the C++ standard, between the compiler doing something and the compiler emitting instructions that cause the hardware to do something. If the CPU may reorder accesses to volatiles, then the standard doesn't require that their order be preserved. ...

... The C++ standard doesn't make any distinction about what does the reordering. And you can't argue that the CPU can reorder them with no observable effect so that's okay -- the C++ standard defines their order as observable. A compiler is compliant with the C++ standard on a platform if it generates code that makes the platform do what the standard requires. If the standard requires accesses to volatiles not be reordered, then a platform the reorders them isn't compliant. ...

My point is that if the C++ standard prohibits the compiler from reordering accesses to distinct volatiles, on the theory that the order of such accesses is part of the program's observable behavior, then it also requires the compiler to emit code that prohibits the CPU from doing so. The standard does not differentiate between what the compiler does and what the compiler's generate code makes the CPU do.

Which does yield two questions: Is either of them "right"? What do actual implementations really do?


Rather than explaining what volatile does, allow me to explain when you should use volatile .

  • When inside an signal handler. Because writing to a volatile variable is pretty much the only thing the standard allows you to do from within a signal handler. Since C++11 you can use std::atomic for that purpose, but only if the atomic is lock-free.
  • When dealing with setjmp according to Intel.
  • When dealing directly with hardware and you want to ensure that the compiler does not optimize your reads or writes away.
  • For example:

    volatile int *foo = some_memory_mapped_device;
    while (*foo)
        ; // wait until *foo turns false
    

    Without the volatile specifier, the compiler is allowed to completely optimize the loop away. The volatile specifier tells the compiler that it may not assume that 2 subsequent reads return the same value.

    Note that volatile has nothing to do with threads. The above example is does not work if there was a different thread writing to *foo because there is no acquire operation involved.

    In all other cases, usage of volatile should be considered non-portable and not pass code review anymore except when dealing with pre-C++11 compilers and compiler extensions (such as msvc's /volatile:ms switch, which is enabled by default under X86/I64).


    Does the C++ volatile keyword introduce a memory fence?

    A C++ compiler which conforms to the specification is not required to introduce a memory fence. Your particular compiler might; direct your question to the authors of your compiler.

    The function of "volatile" in C++ has nothing to do with threading. Remember, the purpose of "volatile" is to disable compiler optimizations so that reading from a register that is changing due to exogenous conditions is not optimized away. Is a memory address that is being written to by a different thread on a different CPU a register that is changing due to exogenous conditions? No. Again, if some compiler authors have chosen to treat memory addresses being written to by different threads on different CPUs as though they were registers changing due to exogenous conditions, that's their business; they are not required to do so. Nor are they required -- even if it does introduce a memory fence -- to, for instance, ensure that every thread sees a consistent ordering of volatile reads and writes.

    In fact, volatile is pretty much useless for threading in C/C++. Best practice is to avoid it.

    Moreover: memory fences are an implementation detail of particular processor architectures. In C#, where volatile explicitly is designed for multithreading, the specification does not say that half fences will be introduced, because the program might be running on an architecture that doesn't have fences in the first place. Rather, again, the specification makes certain (extremely weak) guarantees about what optimizations will be eschewed by the compiler, runtime and CPU to put certain (extremely weak) constraints on how some side effects will be ordered. In practice these optimizations are eliminated by use of half fences, but that's an implementation detail subject to change in the future.

    The fact that you care about the semantics of volatile in any language as they pertain to multithreading indicates that you're thinking about sharing memory across threads. Consider simply not doing that. It makes your program far harder to understand and far more likely to contain subtle, impossible-to-reproduce bugs.


    What David is overlooking is the fact that the c++ standard specifies the behavior of several threads interacting only in specific situations and everything else results in undefined behavior. A race condition involving at least one write is undefined if you don't use atomic variables.

    Consequently the compiler is perfectly in its right to forego any synchronization instructions since your cpu'll only notice the difference in a program that exhibits undefined behavior due to missing synchronization.

    链接地址: http://www.djcxy.com/p/30720.html

    上一篇: 功能优化为无限循环'gcc

    下一篇: C ++ volatile关键字是否引入了内存围栏?