Concurrency, 4 CUDA Applications competing to get GPU resources

What would happen if there are four concurrent CUDA Applications competing for resources in one single GPU so they can offload the work to the graphic card?. The Cuda Programming Guide 3.1 mentions that there are certain methods which are asynchronous:

  • Kernel launches
  • Device device memory copies
  • Host device memory copies of a memory block of 64 KB or less
  • Memory copies performed by functions that are suffixed with Async
  • Memory set function calls
  • As well it mentions that devices with compute capability 2.0 are able to execute multiple kernels concurrently as long as the kernels belong to the same context.

    Does this type of concurrency just apply to streams within a single cuda applications but not possible when there are complete different applications requesting GPU resources??

    Does that mean that the concurrent support is just available within 1 application (context???) and that the 4 applications will just run concurrent in the way that the methods might be overlaped by context switching in the CPU but the 4 applications need to wait until the GPU is freed by the other applications? (ie Kernel launch from app4 waits until a kernel launch from app1 finishes..)

    If that is the case, how these 4 applications might access GPU resources without suffering long waiting times?


    As you said only one "context" can occupy each of the engines at any given time. This means that one of the copy engines can be serving a memcpy for application A, the other a memcpy for application B, and the compute engine can be executing a kernel for application C (for example).

    An application can actually have multiple contexts, but no two applications can share the same context (although threads within an application can share a context).

    Any application that schedules work to run on the GPU (ie a memcpy or a kernel launch) can schedule the work asynchronously so that the application is free to go ahead and do some other work on the CPU and it can schedule any number of tasks to run on the GPU.

    Note that it is also possible to put the GPUs in exclusive mode whereby only one context can operate on the GPU at any time (ie all the resources are reserved for the context until the context is destroyed). The default is shared mode .

    链接地址: http://www.djcxy.com/p/47410.html

    上一篇: CUDA:在多个设备之间共享数据?

    下一篇: 并发性,4个CUDA应用竞争获取GPU资源