Parallelize function using OpenMP

2018-06-28 08:28:46

I'm trying to run code in parallel, but I'm confused with private/shared, etc. stuff related to openmp. I'm using c++ (msvc12 or gcc) and openmp.

The code iterates over the loop which consists of a block that should be run in parallel followed by a block that should be run when all the parallel stuff is done. It doesn't matter in which order the parallel stuff is processed. The code looks like this:

// some X, M, N, Y, Z are some constant values
const int processes = 4;
std::vector<double> vct(X);
std::vector<std::vector<double> > stackVct(processes, std::vector<double>(Y));
std::vector<std::vector<std::string> > files(processes, M)
for(int i=0; i < N; ++i)
{
  // parallel stuff
  for(int process = 0; process < processes; ++process)
  {
    std::vector<double> &otherVct = stackVct[process];
    const std::vector<std::string> &my_files = files[process];

    for(int file = 0; file < my_files.size(); ++file)
    { 
      // vct is read-only here, the value is not modified
      doSomeOtherStuff(otherVct, vct);

      // my_files[file] is read-only
      std::vector<double> thirdVct(Y);
      doSomeOtherStuff(my_files[file], thirdVct(Y));

      // thirdVct and vct are read-only
      doSomeOtherStuff2(thirdVct, otherVct, vct);
    }
  }
  // when all the parallel stuff is done, do this job
  // single thread stuff
  // stackVct is read-only, vct is modified
  doSingleTheadStuff(vct, stackVct)
}

If it is better for performance, "doSingleThreadSuff(...)" can be moved into the parallel loop, but it needs to be processed by a single thread. The order of functions in the most inner loop cannot be changed.

How should I declare #pragma omp stuff to make it working? Thanks!

To run a for loop in parallel is just #pragma omp parallel for above the for loop statement and whatever variables are declared outside the for loop are shared by all the threads and whatever variables are declared inside the for loop are private to each thread.

Note that if you are doing file IO in parallel you may not see much speedup (next to none if all you are doing is file IO) unless at least some of the files reside on different physical hard drives.

Maybe something like this (mind you this is just a sketch, I did not verify it but you can get the idea):

// some X, M, N, Y, Z are some constant values
const int processes = 4;
std::vector<double> vct(X);
std::vector<std::vector<double> > stackVct(processes, std::vector<double>(Y));
std::vector<std::vector<std::string> > files(processes, M)
for(int i=0; i < N; ++i)
{
    // parallel stuff
    #pragma omp parallel firstprivate(vct, files) shared(stackVct)
    {
        #pragma omp for
        for(int process = 0; process < processes; ++process)
        {
            std::vector<double> &otherVct = stackVct[process];
            const std::vector<std::string> &my_files = files[process];

            for(int file = 0; file < my_files.size(); ++file)
            {
                // vct is read-only here, the value is not modified
                doSomeOtherStuff(otherVct, vct);

                // my_files[file] is read-only
                std::vector<double> thirdVct(Y);
                doSomeOtherStuff(my_files[file], thirdVct(Y));

                // thirdVct and vct are read-only
                doSomeOtherStuff2(thirdVct, otherVct, vct);
            }
        }
        // when all the parallel stuff is done, do this job
        // single thread stuff
        // stackVct is read-only, vct is modified
        #pragma omp single nowait
        doSingleTheadStuff(vct, stackVct)
    }
}

I marked vct and files as first private because they are read only and I assumed they should not be modified, so each thread will get a copy of these variables for itself.

The stackVct is marked as shared among all threads because they modify it.

Finally only one thread will execute the doSingleTheadStuff function without forcing other threads to wait.

链接地址: http://www.djcxy.com/p/79210.html

上一篇: 如何使用OpenMP在并行循环中嵌套并行循环

下一篇: 使用OpenMP并行化功能