Why is the OpenMP reduction clause needed to make reductions concurrent?
The OpenMP 'parallel' construct and 'SIMD' construct (new in revision 4.0) define the reduction clause which tells the compiler which variable the reduction is performed on and what is the reduction operator. But why does the compiler need the programmer to tell it this information? GCC, for example, has the capability of identifying reductions without getting any help from the programmer (see here and here). Are there any examples of loops that cannot be made concurrent without specifying the reduction clause?
Reductions are a simple mechanism to improve the performance of parallel applications by removing synchronisation points and at the price of relaxed consistence of memory view. The need to explicitly have a reduction clause should be immediately apparent from the following example:
Imagine that you have a code that does a search over an unordered collection of NUM_ITEMS
items and the goal is to find all items that match given criteria and to collect them in an array matches
as well as to compute the sum of some property of these items. It doesn't matter in what order the items are found. The serial code could be something like this:
int num_matches = 0;
int prop_sum = 0;
for (i = 0; i < NUM_ITEMS; i++)
{
if (criteria(item[i]))
{
match[num_matches] = item[i];
num_matches++;
prop_sum += item[i]->some_property;
}
}
Both num_matches
and prop_sum
are variables, whose values are accumulated as the loop goes. But both variables have completely different semantics. While it is possible that prop_sum
can be computed as a sum of partial sums, num_matches
cannot because of its use as an index in the output array. prop_sum
is a typical candidate for reduction while num_matches
cannot only be reduced, but also one has to utilise an explicit synchronisation construct in order to prevent data races and different threads overwriting the same element of match
:
int num_matches = 0;
int prop_sum = 0;
#pragma omp parallel for reduction(+:prop_sum)
for (i = 0; i < NUM_ITEMS; i++)
{
if (criteria(item[i]))
{
#pragma omp critical(update_matches)
{
match[num_matches] = item[i];
num_matches++;
}
prop_sum += item[i]->some_property;
}
}
Although you might argue that the compiler might be smart enough to notice the way num_matches
is used and automatically generate atomic increments, the goal of OpenMP is to be portable among platforms and compiler vendors. That means that if you write an OpenMP program that conforms to the standard and compiles and works correctly with one compiler, then it should compile and work correctly with other compilers too. The standard is written by many different vendors and not everyone of them has this super smart data dependency discovery mechanism. Besides it becomes very hard to have reliable detection when data, external to the compilation unit, is involved.
reduction
is not a necessity - it is merely a convenience. You can implement your own reduction eg using atomic increments and this might be optimal for your platform, but it might be very far from optimal on other platforms that do not provide efficient atomic increments. On the other side, it is expected that each compiler will generate code that implements the reduction
clause in the best possible way for the given target platform.
上一篇: 如何在Angular2中管理服务