Parallelism in OpenCL on 1 cpu device

Is it possible to achieve the same level of parallelism with a multiple core CPU device as that of multiple heterogenous devices ( like GPU and CPU ) in OpenCL?

I have an intel i5 and am looking to optimise my code. When I query the platform for devices I get only one device returned: the CPU. I was wondering how I could optimise my code by using this.

Also, if I used a single command queue for this device, would the application automatically assign the kernels to different compute devices or does it have to be done manually by the programmer?


Short answer: yes, it will run in parallel and no, no need to do it manually.

Long answer:

Also, if I used a single command queue for this device, would the application automatically assign the kernels to different compute devices [...]

Either you need to revise your OpenCL vocabulary or I didn't understand your question. You only have one device and core != device!

One CPU, regardless of how many cores it has, is one device. The same goes for a GPU: one GPU, which has hundreds of cores, is only one device. You send jobs to the device through the queue and the device's driver. Your jobs can (and will) be split up into work-items. Then, some (how many depends on the device/driver) work-items are executed in parallel. On the GPU aswell as on the CPU, one work-item is executed by one kernel. (This might not be completely true but it is a very helpful abstraction.)

If you enqueue several kernels in one queue (without connecting them through a wait event!), the driver may or may not run them in parallel.

It is the very goal of OpenCL to allow you to compute work-items in parallel regardless of whether it is using several devices' cores in parallel or only a single devices cores.

If this confuses you, watch these really good (and long) videos: http://macresearch.org/opencl


How are you determining the OPENCL device count? I have an Intel I3 laptop that gives me 2 OpenCL compute units? It has 2 cores.

According to Intels spec an I5-2300 has 4 cores and supports 4 threads. It isn't hyper-threaded. I would expect a OpenCL call to the query the # devices to give you a count of 4.


Can a cpu device achieve the same level of parallelism as a gpu? Pretty much always no.

The number of compute units in a gpu is almost always more than in a cpu. For example, $50 can get you a video card with 10 compute units (Radeon 6450). The cheapest 8-core cpus on newegg are going for $189 (desktop cpu) and $269 (server).

The compute units of a cpu will run faster due to clock speed, and execute branching code much better than a gpu. You want a cpu if your workload has a lot of conditional statements. A gpu will execute the same instructions on many pieces of data. The 6450 gpu has 16 'stream processors' per compute unit to make this happen. Gpus are great when you have to do the same (small/medium) tasks many times. Matrix multiplication, n-boy computations, reduction operations, and some sorting algorithms run much better on gpu/accelerator hardware than on a cpu.

I answered a similar question with more detail a few weeks ago. (This one)

Getting back to your question about the "same level of parallelism" -- cpus don't have the same level of parallelism as gpus, except in cases where the gpu under performs on the execution of the actual kernel.

On your i5 system, there would be only one cpu device. This represents the entire cpu. When you query for the number of compute units, opencl will return the number of cores you have. If you want to use all cores, you just run the kernel on your device, and opencl will use all of the compute units (cores) for you.

链接地址: http://www.djcxy.com/p/46422.html

上一篇: 需要为CPU和GPU平台安装opencl吗?

下一篇: 1 cpu设备上的OpenCL的并行性