Data Attribute in OpenMP's task
I have written a code related to quick sort with OpenMP as follows:
#include <iostream>
#include <ctime>
#include <algorithm>
#include <functional>
#include <cmath>
using namespace std;
#include <omp.h>
void ParallelQuickSort(int *begin, int *end)
{
if (begin+1 < end)
{
--end;
int *middle = partition(begin, end, bind2nd(less<int>(), *end));
swap(*end, *middle);
#pragma omp task shared(begin) firstprivate(middle)
ParallelQuickSort(begin, middle);
#pragma omp task shared(end) firstprivate(middle)
ParallelQuickSort(++middle, ++end);
}
}
int main()
{
int n = 200000000;
int* a = new int[n];
for (int i=0; i<n; ++i)
{
a[i] = i;
}
random_shuffle(a, a+n);
cout<<"Sorting "<<n<<" integers."<<endl;
double startTime = omp_get_wtime();
#pragma omp parallel
{
#pragma omp single
ParallelQuickSort(a, a+n);
}
cout<<omp_get_wtime() - startTime<<" seconds."<<endl;
for (int i=0; i<n; ++i)
{
if (a[i] != i)
{
cout<<"Sort failed at location i="<<i<<endl;
}
}
delete[] a;
return 0;
}
The problem I have in the code is the data attribute in task construct within ParallelQuickSort
function. The variable middle should be firstprivate
instead of shared
as it may be changed by threads executing the two tasks. However, if I set variable begin and end as shared
as showed in the code, the program will fail. I wonder why they ( begin
and end
) should be firstprivate
instead of shared
. In my view, as the threads executing the two tasks keep the variable begin
and end
respectively, they will not affect each other. On the other hand, the ParallelQuickSort
function is recursive, and thus there is a race in the variable begin
or end
(for example, in the parent function and in the child function). I am not sure about this suspect as the variables are in different functios (parent and child function).
First, variables that are determined to be private
in a given region are automatically firstprivate
in explicit tasks, so you don't need to declare them explicitly as firstprivate
. Second, your code contains ++end;
and --end;
which modify the value of end
, affecting other tasks if end
is shared
. firstprivate
is the correct data sharing class here - each task simply retains the values of begin
, end
and middle
that they used to have at the time the task was created.
Your ParallelQuickSort
should be as simple as this:
void ParallelQuickSort(int *begin, int *end)
{
if (begin+1 < end)
{
--end;
int *middle = partition(begin, end, bind2nd(less<int>(), *end));
swap(*end, *middle);
#pragma omp task
ParallelQuickSort(begin, middle);
#pragma omp task
ParallelQuickSort(++middle, ++end);
}
}
Note that although this code works, it is way slower than the single-threaded version: 88.2 seconds with 2 threads on a large Xeon X7350 (Tigerton) box versus 50.1 seconds with a single thread. The reason is that tasks creation continues up to the very simple task of swapping two array elements. The overhead of using tasks is huge and you should put a sane upper threshold below which tasking should be disabled, let's say when the subarray size has reached 1024 elements. The exact number depends on the OpenMP run-time implementation, your CPU type and memory speed, so the value of 1024 is more or less randomly chosen. Still the optimal value should not create two tasks that process elements that would fall in the same cache line, so the number of elements should be a multiple of 16 (64 bytes per cache line / 4 bytes per integer):
void ParallelQuickSort(int *begin, int *end)
{
if (begin+1 < end)
{
--end;
int *middle = partition(begin, end, bind2nd(less<int>(), *end));
swap(*end, *middle);
#pragma omp task if((end - begin) > 1024)
ParallelQuickSort(begin, middle);
#pragma omp task if((end - begin) > 1024)
ParallelQuickSort(++middle, ++end);
}
}
With this modification the code runs for 34.2 seconds with two threads on the same box.
链接地址: http://www.djcxy.com/p/79262.html上一篇: C ++ OpenMP写入共享数组/矢量的特定元素
下一篇: OpenMP任务中的数据属性