Parallel GPU computing using OpenCV

2018-06-16 18:12:46

I have an application that requires processing multiple images in parallel in order to maintain real-time speed.

It is my understanding that I cannot call OpenCV's GPU functions in a multi-threaded fashion on a single CUDA device. I have tried an OpenMP code construct such as the following:

#pragma omp parallel for
for(int i=0; i<numImages; i++){
    for(int j=0; j<numChannels; j++){
        for(int k=0; k<pyramidDepth; k++){
            cv::gpu::multiply(pyramid[i][j][k], weightmap[i][k], pyramid[i][j][k]);
        }
    }
}

This seems to compile and execute correctly, but unfortunately it appears to execute the numImages threads serially on the same CUDA device.

I should be able to execute multiple threads in parallel if I have multiple CUDA devices, correct? In order to get multiple CUDA devices, do I need multiple video cards?

Does anyone know if the nVidia GTX 690 dual-chip card works as two independent CUDA devices with OpenCV 2.4 or later? I found confirmation it can work as such with OpenCL, but no confirmation with regard to OpenCV.

Just do the multiply passing whole images to the cv::gpu::multiply() function.

OpenCV and CUDA will handle splitting it and dividing the task in the best way. Generally each computer unit (ie core) in a GPU can run multiple threads (typically >=16 in CUDA). This is in addition to having cards that can appear as multiple GPUs or putting multiple linked cards in one machine.

The whole point of cv::gpu is to save you from having to know anything about how the internals work.

The answer from Martin worked for me. The key is to make use of the gpu::Stream class if your CUDA device is listed as compute capability 2 or higher. I will restate it here because I could not post the code clip correctly in the comment mini editor.

cv::gpu::Stream stream[3];

for(int i=0; i<numImages; i++){
    for(int j=0; j<numChannels; j++){
        for(int k=0; k<pyramidDepth; k++){
            cv::gpu::multiply(pyramid[i][j][k], weightmap[i][k], pyramid[i][j][k], stream[i]);
        }
    }
}

The above code seems to execute the multiply in parallel (numImages = 3 for my app). There are also Stream methods to aid in uploading/downloading images to and from GPU memory as well as methods to check the state of a stream in order to aid in synchronization with other code.

So... it apparently does not require multiple CUDA devices (ie GPU cards) in order to execute OpenCV GPU code in parallel!

I don't know anything about OpenCV's GPU functions, but if they are completely self-contained (ie, create GPU context, transfer data to GPU, compute results, transfer results back to CPU), then it's not surprising that these functions appear serialized when using a single GPU.

If you have multiple GPUs, then there should be some way to tell the OpenCV function to target a specific GPU. If you have multiple GPUs and can target them effectively, I then I see no reason why the GPU function calls wouldn't be parallelized. According to the OpenCV wiki, the GPU functions target only a single GPU, but you can manually split up work yourself: http://opencv.willowgarage.com/wiki/OpenCV%20GPU%20FAQ#Can_I_use_two_or_more_GPUs.3F

Dual GPUs like the GTX 690 will appear as two distinct devices with their own memory as far as your GPU program is concerned. See here: http://forums.nvidia.com/index.php?showtopic=231726

Also, if you are going a dual GPU route for compute applications, I would recommend against the GTX 690 because its compute performance is somewhat crippled compared to the GTX 590.

链接地址: http://www.djcxy.com/p/47416.html

上一篇: CUDA multi中的并发性

下一篇: 使用OpenCV进行并行GPU计算