Data Attribute in OpenMP's task

I have written a code related to quick sort with OpenMP as follows:

#include <iostream>
#include <ctime>
#include <algorithm>
#include <functional>
#include <cmath>
using namespace std;
#include <omp.h>

void ParallelQuickSort(int *begin, int *end)
{
    if (begin+1 < end) 
    {
        --end;
        int *middle = partition(begin, end, bind2nd(less<int>(), *end));
        swap(*end, *middle);
        #pragma omp task shared(begin) firstprivate(middle)
            ParallelQuickSort(begin, middle);
        #pragma omp task shared(end) firstprivate(middle)
            ParallelQuickSort(++middle, ++end); 
    }
}

int main()
{
    int n = 200000000;

    int* a = new int[n];

    for (int i=0; i<n; ++i)
    {
        a[i] = i;
    }

    random_shuffle(a, a+n);
    cout<<"Sorting "<<n<<" integers."<<endl;

    double startTime = omp_get_wtime();
    #pragma omp parallel
    {
        #pragma omp single
            ParallelQuickSort(a, a+n);
    }
    cout<<omp_get_wtime() - startTime<<" seconds."<<endl;

    for (int i=0; i<n; ++i)
    {
        if (a[i] != i) 
        {
            cout<<"Sort failed at location i="<<i<<endl;
        }
    }

    delete[] a;
    return 0;
}

The problem I have in the code is the data attribute in task construct within ParallelQuickSort function. The variable middle should be firstprivate instead of shared as it may be changed by threads executing the two tasks. However, if I set variable begin and end as shared as showed in the code, the program will fail. I wonder why they ( begin and end ) should be firstprivate instead of shared . In my view, as the threads executing the two tasks keep the variable begin and end respectively, they will not affect each other. On the other hand, the ParallelQuickSort function is recursive, and thus there is a race in the variable begin or end (for example, in the parent function and in the child function). I am not sure about this suspect as the variables are in different functios (parent and child function).


First, variables that are determined to be private in a given region are automatically firstprivate in explicit tasks, so you don't need to declare them explicitly as firstprivate . Second, your code contains ++end; and --end; which modify the value of end , affecting other tasks if end is shared . firstprivate is the correct data sharing class here - each task simply retains the values of begin , end and middle that they used to have at the time the task was created.

Your ParallelQuickSort should be as simple as this:

void ParallelQuickSort(int *begin, int *end)
{
    if (begin+1 < end) 
    {
        --end;
        int *middle = partition(begin, end, bind2nd(less<int>(), *end));
        swap(*end, *middle);
        #pragma omp task
            ParallelQuickSort(begin, middle);
        #pragma omp task
            ParallelQuickSort(++middle, ++end); 
    }
}

Note that although this code works, it is way slower than the single-threaded version: 88.2 seconds with 2 threads on a large Xeon X7350 (Tigerton) box versus 50.1 seconds with a single thread. The reason is that tasks creation continues up to the very simple task of swapping two array elements. The overhead of using tasks is huge and you should put a sane upper threshold below which tasking should be disabled, let's say when the subarray size has reached 1024 elements. The exact number depends on the OpenMP run-time implementation, your CPU type and memory speed, so the value of 1024 is more or less randomly chosen. Still the optimal value should not create two tasks that process elements that would fall in the same cache line, so the number of elements should be a multiple of 16 (64 bytes per cache line / 4 bytes per integer):

void ParallelQuickSort(int *begin, int *end)
{
    if (begin+1 < end) 
    {
        --end;
        int *middle = partition(begin, end, bind2nd(less<int>(), *end));
        swap(*end, *middle);
        #pragma omp task if((end - begin) > 1024)
            ParallelQuickSort(begin, middle);
        #pragma omp task if((end - begin) > 1024)
            ParallelQuickSort(++middle, ++end); 
    }
}

With this modification the code runs for 34.2 seconds with two threads on the same box.

链接地址: http://www.djcxy.com/p/79262.html

上一篇: C ++ OpenMP写入共享数组/矢量的特定元素

下一篇: OpenMP任务中的数据属性