嵌套openmp循环

2018-06-28 08:41:10

我有以下风格的一段代码：

for (set=0; set < n; set++)  //For1
{
   #pragma omp parallel for num_threads(x)
    for (i=0; i < m; i++)   //For2: this loop can be executed in parallel
   {
      commands...
   }

   for (j=0; j < m; j++)   //For3: this loop depends on the output of the For2 and also should be executed in a sequential way
   {
     commands...
   }

}

正如你注意到的，我有n个独立的集合（外循环，即For1）。每个Set由一个并行循环（For2）和一个应该在For2之后执行的顺序部分（For3）组成。

我已经使用“#pragma omp parallel for num_threads（x）”来让For2平行。

现在我想让外部循环（For1）平行。换句话说，我想要并行运行每个Set。

我真的很感激你能否让我知道在openmp中可能如何。

一种方法可能是创建与每个Set相对应的n个线程。这是对的吗？但我想知道是否有另一种方式完全使用openmp功能？

提前致谢。

你可以简单地通过平行外部循环

#pragma omp parallel for num_threads(x) private(i,j)
for (set=0; set < n; set++)  //For1
{
    for (i=0; i < m; i++)   //For2: this loop can be executed in parallel
   {
      commands...
   }

   for (j=0; j < m; j++)   //For3: this loop depends on the output of the For2 and also should be executed in a sequential way
   {
     commands...
   }

}

你可以尝试融合第一个和第二个循环（见下文）。我不知道这是否会让它变得更好，但值得一试。

    #pragma omp parallel num_threads(x) private(set, i)
    {
        #pragma omp for schedule(static)
        for (k = 0; k < n*m; k++) //fused For1 and For2
        {
            set = k/m;
            i = k%m;
            //commands...
        }
        #pragma omp for schedule(static)
        for (set = 0; set < n; set++)
        {
            for (i = 0; i < m; i++) //For3 - j is not necessary so reuse i 
            {
                //commands...
            }
        }
    }

根据您拥有的套件数量，简单地并行化外部循环可能证明是您的最佳选择。如果您的计算机上的内核数量超过了此数量，那么它可能比并行内部循环更快，因为在这种情况下，线程创建开销要小得多。

假设您的操作是cpu绑定的，并且外部循环并行化，您将完全使用计算机上的所有内核。如果已经使用了所有资源，则进一步尝试并行化内部循环的速度将不会更快。

如果您的套件数量少于可用的内核数量，则需要对内部循环进行并行处理，并且您很可能已经占用了所有可用的计算能力。

如果你真的想并行化两个循环，那么你应该考虑MPI，并在几台计算机上进行混合并行处理; 外部循环在多台计算机上并行执行，内部循环在一台计算机的所有内核上并行执行。

链接地址: http://www.djcxy.com/p/79233.html

上一篇: Nested openmp loop

下一篇: Why my C code is slower using OpenMP