How to assign a specific job to each thread for matrix addition in openmp

I am trying to create a matrix addition program to practice with OpenMP. I have N^2 processors/threads and need to assign each thread such that it computes one entry of the resultant matrix. For example, if I have two matrices A and B of size NxN, then each thread should compute one entry of the resultant matrix C. Upon reading some of the beginner tutorials in OpenMp it seems that the #pragma omp parallel for directive divides the tasks equally among the total number of threads specified. But in the code below only 3 threads are active, and not 9 as I want.

The code I have is as follows:

 #include <stdio.h>
#include "omp.h"



void  main() {

  // omp_set_num_threads(NUM_THREADS);
  int i, k;
  int N=3;

  int A[3][3] = { {1, 2, 3},{ 5, 6, 7}, {8,9,10} };
  int B[3][3] =  { {1, 2, 3},{ 5, 6, 7}, {8,9,10} };
  int C[3][3] ;

  omp_set_dynamic(0);
   omp_set_num_threads(9);
   // printf("Num of threads %i n", omp_get_max_threads());

#pragma omp parallel for private(i,k) shared(A, B, C, N)

  for (i = 0; i< N; i++) {
    for (k=0; k< N;k++){

           int j = omp_get_thread_num();

       C[i][k] = A[i][k] +  B[i][k] ;


              printf("I m thread %d computing A[%d][%d] and B[%d][%d] = %d n ", j, i,k, i,k, C[i][k]);

  }
  }


  int n, m;
  for (n=0; n<3; n++) {
    for ( m=0;m<3;m++){
      printf("C[%d][%d] = %d n",n,m, C[n][m]);   

 }


}

}

And the output I am getting is:

I m thread 0 computing A[0][0] and B[0][0] = 2 
 I m thread 1 computing A[1][0] and B[1][0] = 10 
 I m thread 1 computing A[1][1] and B[1][1] = 12 
 I m thread 1 computing A[1][2] and B[1][2] = 14 
 I m thread 0 computing A[0][1] and B[0][1] = 4 
 I m thread 0 computing A[0][2] and B[0][2] = 6 
 I m thread 2 computing A[2][0] and B[2][0] = 16 
 I m thread 2 computing A[2][1] and B[2][1] = 18 
 I m thread 2 computing A[2][2] and B[2][2] = 20 
 C[0][0] = 2 
C[0][1] = 4 
C[0][2] = 6 
C[1][0] = 10 
C[1][1] = 12 
C[1][2] = 14 
C[2][0] = 16 
C[2][1] = 18 
C[2][2] = 20 

What I want though is that each of the nine threads compute one entry of the matrix C. Could anyone please help regarding this. I am new to C and OpenMP both. I am also confused regarding the exact function of private variables in the private clause. For example, if I specify 'i' and 'k' as private, then does that mean that each of the thread will have a copy of 'i' and 'k' and may therefore run their own iteration of the loop? but that doesn't make sense since in the above output thread 0 is computing all the row 0 values, and thread 1 all the row 1 values. How is this happening on its own without any specific directive? Thank you for your help!


Using #pragma omp parallel for on outer for loop, it is applied only on the outer loop, which only iterates 3 times ( N = 3 ), so you only need 3 threads.

If you want to use 9 threads, you should collapse the 2d array to 1d, using a single index, let's call it p :

#pragma omp parallel for private(i, k, p) shared(A, B, C, N)
for (p = 0; p < N * N; p++) {
    i = p / N;
    k = p % N;
    C[i][k] = A[i][k] + B[i][k];
}

As stated on George's answer and Timothy's comment, you can also use OpenMP's collapse(2) keyword to achieve the same thing.


Another way ,if you want to retain the 2 loops ,besides 'chrk' answer ,is to use:

#pragma omp parallel for private(i,k) shared(A, B, C, N) collapse(2)

Like this , you will have parallel execution in both loops. Because , right now , you have only parallel execution in the outer loop. That's why you see that for example thread 1 calculates all the row 1 values.

链接地址: http://www.djcxy.com/p/79222.html

上一篇: 在openmp中进行Mandelbrot优化

下一篇: 如何在openmp中为每个线程分配一个特定的工作以添加矩阵