How to perform deep copying of struct with CUDA?

This question already has an answer here:

  • Copying a struct containing pointers to CUDA device 3 answers

  • The short answer is "just don't". There are four reasons why I say that:

  • There is no deep copy functionality in the API
  • The resulting code you will have to writeto set up and copy the structure you have described to the GPU will be ridiculously complex (about 4000 API calls at a minimum, and probably an intermediate kernel for your 20 Matrix of 100 Cells example)
  • The GPU code using three levels of pointer indirection will have massively increased memory access latency and will break what little cache coherency is available on the GPU
  • If you want to copy the data back to the host afterwards, you have the same problem in reverse
  • Consider using linear memory and indexing instead. It is portable between host and GPU, and the allocation and copy overhead is about 1% of the pointer based alternative.

    If you really want to do this, leave a comment and I will try and dig up some old code examples which show what a complete folly nested pointers are on the GPU.

    链接地址: http://www.djcxy.com/p/8310.html

    上一篇: 摆脱多余的#import行

    下一篇: 如何使用CUDA执行struct的深层复制?