OpenMP parallelize multiple sequential loops

I want to parallelize the following function with OpenMP:

void calculateAll() {
int k;
int nodeId1, minCost1, lowerLimit1, upperLimit8;
for (k = mostUpperLevel; k > 0; k--) {
    int myStart = borderNodesArrayStartGlobal[k - 1];
    int size = myStart + borderNodesArraySizeGlobal[k - 1];
/* this loop may be parallel */    
for (nodeId1 = myStart; nodeId1 < size; nodeId1++) {
        if (getNodeScanned(nodeId1)) {
            setNodeScannedFalse(nodeId1);
        } else {
            minCost1 = myMax;
            lowerLimit1 = getNode3LevelsDownAll(nodeId1);
            upperLimit8 = getUpperLimit3LevelsDownAll(nodeId1);
            changeNodeValue(nodeId1, lowerLimit1, upperLimit8, minCost1, minCost1);
        }
    }
}

int myStart = restNodesArrayStartGlobal;
int size = myStart + restNodesArraySizeGlobal;
/* this loop may also be parallel */  
for (nodeId1 = myStart; nodeId1 < size; nodeId1++) {
    if (getNodeScanned(nodeId1)) {
        setNodeScannedFalse(nodeId1);
    } else {
        minCost1 = myMax;
        lowerLimit1 = getNode3LevelsDownAll(nodeId1);
        upperLimit8 = getUpperLimit3LevelsDownAll(nodeId1);
        changeNodeValue(nodeId1, lowerLimit1, upperLimit8, minCost1, minCost1);
    }
}
}

Although I can use "omp pragma parallel for" on the 2 inside loops, code is too slow due to the constant overhead of creating new threads. Is there a way to separate "omp pragma parallel" so that at the beginning of function I take the necessary threads and then with "omp pragma for" to get the best possible results? I am using gcc 4.6.

Thanks in advance


The creation of the threads is normally not the bottleneck in openmp programs. It is the distribution of the tasks to the threads. The threads are actually generated at the first #pragma omp for (You can verify that with a profiler like VTune. At each loop the work is assigned to the threads. This assignment is often the problem as this is a costly operation.

However you should try to play around with the schedulers. As this might have a big impact on the performance. Eg play with schedule(dynamic,chunksize) vs schedule(static,chunksize) and also try different chunksizes.

链接地址: http://www.djcxy.com/p/79230.html

上一篇: 为什么我的C代码使用OpenMP更慢

下一篇: OpenMP并行化多个顺序循环