Where is the load barrier for the volatile statement?

I wrote this simple Java program:

package com.salil.threads;

public class IncrementClass {

    static volatile int j = 0;
    static int i = 0;

    public static void main(String args[]) {

        for(int a=0;a<1000000;a++);
        i++;
        j++;            
    }       
}

This generate the following disassembled code for i++ and j++ (remaining disassembled code removed):

  0x0000000002961a6c: 49ba98e8d0d507000000 mov       r10,7d5d0e898h
                                                ;   {oop(a 'java/lang/Class' = 'com/salil/threads/IncrementClass')}
  0x0000000002961a76: 41ff4274            inc       dword ptr [r10+74h]
                                                ;*if_icmpge
                                                ; - com.salil.threads.IncrementClass::main@5 (line 10)
  0x0000000002961a7a: 458b5a70            mov       r11d,dword ptr [r10+70h]
  0x0000000002961a7e: 41ffc3              inc       r11d
  0x0000000002961a81: 45895a70            mov       dword ptr [r10+70h],r11d
  0x0000000002961a85: f083042400          lock add  dword ptr [rsp],0h
                                                ;*putstatic j
                                                ; - com.salil.threads.IncrementClass::main@27 (line 14)

This is what I understand about the following assembly code:

  • mov r10,7d5d0e898h : Moves the pointer to the IncrementClass.class to register r10
  • inc dword ptr [r10+74h] : Increments the 4 byte value at the address at [r10 + 74h],(ie i)
  • mov r11d,dword ptr [r10+70h] :Moves the 4 value value at the address [r10 + 70h] to register r11d (ie move value of j to r11d)
  • inc r11d : Increment r11d
  • mov dword ptr [r10+70h],r11d : write value of r11d to [r10 + 70h] so it is visible to other threads -lock add dword ptr [rsp],0h : lock the memory address represented by the stack pointer rsp and add 0 to it.
  • JMM states that before each volatile read there must be a load memory barrier and after every volatile write there must be a store barrier. My question is:

  • Why isn't there a load barrier before the read of j into r11d?
  • How does the lock and add to rsp ensure the value of j in r11d is propogated back to main memory. All I read from the intel specs is that lock provides the cpu with an exclusive lock on the specified memory address for the duration of the operation.

  • Intel Processor x86 has a strong memory model.

    Therefore all barrier StoreStore , LoadLoad, LoadStore are no-op on x86. Except StoreLoad which can be realized via mfence or cpuid or locked insn . Which you can already confirm with your assembly code. Other barriers just mean restriction to compilers optimization and transformation so that they don't break java memory model spec.

    As you ran on intel Processor i am assuming its x86.

    Please read

  • http://gee.cs.oswego.edu/dl/jmm/cookbook.html for reference.

  • http://psy-lob-saw.blogspot.com/2013/08/memory-barriers-are-not-free.html

  • http://jsr166-concurrency.10961.n7.nabble.com/x86-NOOP-memory-barriers-td9991.html
  • Lock is not an instruction but moreof a instruction prefix (behaves as a storeLoad barrier).

  • What does the "lock" instruction mean in x86 assembly?
  • Why we need lock prefix before CMPXCHG

  • volatile keyword in Java only guarantee that the thread local copies and caches would be skipped and the value would be loaded directly from main memory or write to main memory. However it doesn't contains locking mechanism. Thus reading from volatile , or writing to volatile , is atomic, but a series of read and write operations, like your above

    j++

    is NOT atomic, because some other thread can modify the value of j between the read and write to the variable in main memory. To achieve atomic increment you need to use CAS operations which is wrapped in Atomic classes in java, like AtomicInteger etc. Alternatively, if you prefer low level programming, you can use atomic methods in Unsafe class Eg Unsafe.compareAndSwapInt etc.


    The barrier may be optimized out by your JIT compiler since your program is single-threaded(there is only a thread-the main thread), just like a lock under a single-threaded environment can be optimized out. This optimization is independent of processor architecture.

    链接地址: http://www.djcxy.com/p/67096.html

    上一篇: 比较两个防止定时攻击的字节阵列

    下一篇: volatile语句的负载障碍在哪里?