C++0x memory model and speculative loads/stores
So I was reading about the memory model that is part of the upcoming C++0x standard. However, I'm a bit confused about some of the restrictions for what the compiler is allowed to do, specifically about speculative loads and stores.
To start with, some of the relevant stuff:
Hans Boehm's pages about threads and the memory model in C++0x
Boehm, "Threads Cannot be Implemented as a Library"
Boehm and Adve, "Foundations of the C++ Concurrency Memory Model"
Sutter, "Prism: A Principle-Based Sequential Memory Model for Microsoft Native Code Platforms", N2197
Boehm, "Concurrency memory model compiler consequences", N2338
Now, the basic idea is essentially "Sequential Consistency for Data-Race-Free Programs", which seems to be a decent compromise between ease of programming and allowing the compiler and hardware opportunities to optimize. A data race is defined to occur if two accesses to the same memory location by different threads are not ordered, at least one of them stores to the memory location, and at least one of them is not a synchronization action. It implies that all read/write access to shared data must be via some synchronization mechanism, such as mutexes or operations on atomic variables (well, it is possible to operate on the atomic variables with relaxed memory ordering for experts only, but the default provides for sequential consistency).
In light of this, I'm confused about the restrictions about spurious or speculative loads/stores on ordinary shared variables. For instance, in N2338 we have the example
switch (y) {
case 0: x = 17; w = 1; break;
case 1: x = 17; w = 3; break;
case 2: w = 9; break;
case 3: x = 17; w = 1; break;
case 4: x = 17; w = 3; break;
case 5: x = 17; w = 9; break;
default: x = 17; w = 42; break;
}
which the compiler is not allowed to transform into
tmp = x; x = 17;
switch (y) {
case 0: w = 1; break;
case 1: w = 3; break;
case 2: x = tmp; w = 9; break;
case 3: w = 1; break;
case 4: w = 3; break;
case 5: w = 9; break;
default: w = 42; break;
}
since if y == 2 there is a spurious write to x which could be a problem if another thread were concurrently updating x. But, why is this a problem? This a data race, which is prohibited anyway; in this case, the compiler just makes it worse by writing to x twice, but even a single write would be enough for a data race, no? Ie a proper C++0x program would need to synchronize access to x, in which case there would no longer be data race, and the spurious store wouldn't be a problem either?
I'm similarly confused about Example 3.1.3 in N2197 and some of the other examples as well, but maybe an explanation for the above issue would explain that too.
EDIT: The Answer:
The reason why speculative stores are a problem is that in the switch statement example above, the programmer might have elected to conditionally acquire the lock protecting x only if y != 2. Hence the speculative store might introduce a data race that was not there in the original code, and the transformation is thus forbidden. The same argument applies to Example 3.1.3 in N2197 as well.
I'm not familiar with all the stuff you refer to, but notice that in the y==2 case, in the first bit of code, x is not written to at all (or read, for that matter). In the second bit of code, it is written twice. This is more of a difference than just writing once vs. writing twice (at least, it is in existing threading models such as pthreads). Also, storing a value which would not otherwise be stored at all is more of a difference than just storing once vs. storing twice. For both these reasons, you don't want compilers just replacing a no-op with tmp = x; x = 17; x = tmp;
tmp = x; x = 17; x = tmp;
.
Suppose thread A wants to assume that no other thread modifies x. It's reasonable to want it to be allowed to expect that if y is 2, and it writes a value to x, then reads it back, it will get back the value it has written. But if thread B is concurrently executing your second bit of code, then thread A could write to x and later read it, and get back the original value, because thread B saved "before" the write and restored "after" it. Or it could get back 17, because thread B stored 17 "after" the write, and stored tmp back again "after" thread A reads. Thread A can do whatever synchronisation it likes, and it won't help, because thread B isn't synchronised. The reason it's not synchronised (in the y==2 case) is that it's not using x. So the concept of whether a particular bit of code "uses x" is important to the threading model, which means compilers can't be allowed to change code to use x when it "shouldn't".
In short, if the transformation you propose were allowed, introducing a spurious write, then it would never be possible to analyse a bit of code and conclude that it does not modify x (or any other memory location). There are a number of convenient idioms which would therefore be impossible, such as sharing immutable data between threads without synchronisation.
So, although I'm not familiar with C++0x's definition of "data race", I assume that it includes some conditions where programmers are allowed to assume that an object is not written to, and that this transformation would violate those conditions. I speculate that if y==2, then your original code, together with concurrent code: x = 42; x = 1; z = x
x = 42; x = 1; z = x
x = 42; x = 1; z = x
in another thread, is not defined to be a data race. Or at least if it is a data race, it's not one which permits z to end up with value either 17, or 42.
Consider that in this program, the value 2 in y might be used to indicate, "there are other threads running: don't modify x, because we aren't synchronised here, so that would introduce a data race". Perhaps the reason there's no synchronisation at all, is that in all other cases of y, there are no other threads running with access to x. It seems reasonable to me that C++0x would want to support code like this:
if (single_threaded) {
x = 17;
} else {
sendMessageThatSafelySetsXTo(17);
}
Clearly then, you don't want that transformed to:
tmp = x;
x = 17;
if (!single_threaded) {
x = tmp;
sendMessageThatSafelySetsXTo(17);
}
Which is basically the same transformation as in your example, but with only 2 cases, instead of there being enough to make it look like a good code-size optimisation.
If y==2
, and another thread modifies or reads x
, how is there a race condition in the original sample? This thread never touches x
, so other threads can do so freely.
But with the reordered version, our thread modifies x
, if only temporarily, so if another thread also manipulates it, we now have a race condition where none was present before.
上一篇: 比较和交换C ++ 0x
下一篇: C ++ 0x内存模型和推测加载/存储