openmp latency for inside for

I have a piece of code that i want to parallelize and the openmp program is much slower than the serial version, so what is wrong with my implementation?. This is the code of the program #include <iostream> #include <gsl/gsl_math.h> #include "Chain.h" using namespace std; int main(){ int const N=1000; int timeSteps=100; double delta=0.0001; double qq[N]; Chain ch(N); ch

openmp等待内部for

我有一段代码,我想要并行化,并且openmp程序比串行版本慢很多,所以我的实现有什么问题? 这是该程序的代码 #include <iostream> #include <gsl/gsl_math.h> #include "Chain.h" using namespace std; int main(){ int const N=1000; int timeSteps=100; double delta=0.0001; double qq[N]; Chain ch(N); ch.initCond(); for (int t=0; t<timeSteps; t++){ ch.changeQ(delta*t); ch.ca

c++

How can I thread the nested for-loop below safely in order to run the programme in parallel on a core with 8 threads and still output data in the correct order. I have tried using the #pragma omp for command but that gives me an error message: work-sharing region may not be closely nested inside of work-sharing, critical or explicit task region . Note: This code is for an introduction to paral

C ++

如何安全地将嵌套的for-loop线程化,以便在具有8个线程的核心上并行运行程序,并仍按正确的顺序输出数据。 我曾尝试使用#pragma omp for命令,但这给了我一个错误消息: 工作共享区域可能没有紧密嵌套在工作共享,关键或显式任务区域内 。 注意:这段代码是为了介绍并行编程,所以为了优化而编写的代码很差 #pragma omp parallel private(t, i, j) shared(nx, ny, nt) { // main loop for (int t = 0; t < nt; t+

Openmp: nested loops and allocation

I'd like to parallelize a for loop within another for loop. I can simply use the instruction "#pragma omp parallel for" directly in the inner loop, but I fear that creating a new set of threads each time is not the oprimal thing to do. In the outer loop (before the inner one) there is the allocation and some other instructions to be done by a single thread (I allocate a matrix to

Openmp:嵌套循环和分配

我想在另一个for循环中并行化一个for循环。 我可以直接在内部循环中使用“#pragma omp parallel for”指令,但是我担心每次创建一组新线程并不是最佳选择。 在外部循环(在内部循环之前)有分配和一些其他指令由单个线程完成(我分配一个矩阵在内部循环中共享,所以每个线程都应该有权访问它)。 我试图做这样的事情: #pragma omp parallel { for (auto t=1;t<=time_step;++t){ #pragma omp

Nested openmp loop

I have a piece of code in the following style: for (set=0; set < n; set++) //For1 { #pragma omp parallel for num_threads(x) for (i=0; i < m; i++) //For2: this loop can be executed in parallel { commands... } for (j=0; j < m; j++) //For3: this loop depends on the output of the For2 and also should be executed in a sequential way { commands... } }

嵌套openmp循环

我有以下风格的一段代码: for (set=0; set < n; set++) //For1 { #pragma omp parallel for num_threads(x) for (i=0; i < m; i++) //For2: this loop can be executed in parallel { commands... } for (j=0; j < m; j++) //For3: this loop depends on the output of the For2 and also should be executed in a sequential way { commands... } } 正如你注意到的,我有n

Why my C code is slower using OpenMP

I m trying to do multi-thread programming on CPU using OpenMP. I have lots of for loops which are good candidate to be parallel. I attached here a part of my code. when I use first #pragma omp parallel for reduction, my code is faster, but when I try to use the same command to parallelize other loops it gets slower. does anyone have any idea why it is like this? . . . omp_set_dynami

为什么我的C代码使用OpenMP更慢

我尝试使用OpenMP在CPU上执行多线程编程。 我有很多for循环是很好的候选人并行。 我在这里附上我的代码的一部分。 当我使用第一个#pragma omp parallel进行缩减时,我的代码更快,但是当我尝试使用相同的命令来并行化其他循环时,它会变得更慢。 有没有人有任何想法,为什么它是这样的? . . . omp_set_dynamic(0); omp_set_num_threads(4); float *h1=new float[nvi]; float *h2=new flo

OpenMP parallelize multiple sequential loops

I want to parallelize the following function with OpenMP: void calculateAll() { int k; int nodeId1, minCost1, lowerLimit1, upperLimit8; for (k = mostUpperLevel; k > 0; k--) { int myStart = borderNodesArrayStartGlobal[k - 1]; int size = myStart + borderNodesArraySizeGlobal[k - 1]; /* this loop may be parallel */ for (nodeId1 = myStart; nodeId1 < size; nodeId1++) { if (ge

OpenMP并行化多个顺序循环

我想用OpenMP并行化以下功能: void calculateAll() { int k; int nodeId1, minCost1, lowerLimit1, upperLimit8; for (k = mostUpperLevel; k > 0; k--) { int myStart = borderNodesArrayStartGlobal[k - 1]; int size = myStart + borderNodesArraySizeGlobal[k - 1]; /* this loop may be parallel */ for (nodeId1 = myStart; nodeId1 < size; nodeId1++) { if (getNodeScanned(nodeId1)) {

threads(1) executes faster than no OpenMP

I've run my code in a variety of circumstances which has resulted in what I believe to be odd behavior. My testing was on a dual core intel xeon processor with HT. No OpenMP '#pragma' statement, total runtime = 507 seconds With OpenMP '#pragma' statement specifying 1 core, total runtime = 117 seconds With OpenMP '#pragma' statement specifying 2 core, total runti

线程(1)执行的速度比没有OpenMP快

我在各种情况下运行我的代码,导致我认为是奇怪的行为。 我的测试是使用HT的双核英特尔至强处理器。 没有OpenMP'#pragma'语句,总运行时间= 507秒 使用OpenMP'#pragma'语句指定1个内核,总运行时间= 117秒 使用OpenMP'#pragma'语句指定2个核心,总​​运行时间= 150秒 使用OpenMP'#pragma'语句指定3个核心,总​​运行时间= 157秒 使用OpenMP'#pragma'语句指定4个核心,总​​运行

omp use in ubuntu c++

I'm trying to write a code with omp; it has a section like this (the Brandes's Betweeness Centrality algorithm): int main(int argc, char* argv[]){ {//file reading section (...) } A = new vector<int>[V]; int conteudo; for(i=0;i<E;i++){ conteudo = grafo[i][0]; A[conteudo].push_back(grafo[i][1]); A[grafo[i][1]].push_back(conteudo); } fl

omp在ubuntu c ++中使用

我试图用omp编写代码; 它有一个这样的部分(Brandes的Betweeness Centrality算法): int main(int argc, char* argv[]){ {//file reading section (...) } A = new vector<int>[V]; int conteudo; for(i=0;i<E;i++){ conteudo = grafo[i][0]; A[conteudo].push_back(grafo[i][1]); A[grafo[i][1]].push_back(conteudo); } float *Cb = new float[V]; // CÓDIGO DE

OpenMP slow private function

I want to use OpenMP to parallelize a for-loop inside a function that is called from the main in c++. My code runs much slower than in sequential mode: The for-loop takes about 6.1s (wall-clock) without OpenMP (just commenting out the #pragma... command), and 11.8s with OpenMP. My machine has 8 CPUs and 8183Mb of physical storage and is equipped with a 64 bit windows 7 Operating System. I us

OpenMP缓慢的私人功能

我想使用OpenMP来并行化一个函数,该函数在c ++中由main调用。 我的代码运行速度比顺序模式慢得多:没有OpenMP的for-loop需要大约6.1s(挂钟)(只是注释#pragma ...命令),使用OpenMP需要11.8s。 我的机器有8个CPU和8183Mb的物理存储器,并配有64位Windows 7操作系统。 我在调试模式下使用Visual Studio编译器来处理64位系统。 我已经读过,性能下降可能是由于应声明为私有变量,但我不确定如何正确执行此操作,以及哪

Parallelize function using OpenMP

I'm trying to run code in parallel, but I'm confused with private/shared, etc. stuff related to openmp. I'm using c++ (msvc12 or gcc) and openmp. The code iterates over the loop which consists of a block that should be run in parallel followed by a block that should be run when all the parallel stuff is done. It doesn't matter in which order the parallel stuff is processed. T

使用OpenMP并行化功能

我试图并行运行代码,但我与私有/共享等与openmp相关的东西感到困惑。 我使用的是c ++(msvc12或gcc)和openmp。 代码在循环中迭代,该循环由一个应该并行运行的块组成,后面跟着一个应该在完成所有并行运行时运行的块。 并行处理的顺序并不重要。 代码如下所示: // some X, M, N, Y, Z are some constant values const int processes = 4; std::vector<double> vct(X); std::vector<std::vector<double> &