CUDA推力：从设备到设备复制

2018-06-16 17:48:47

我有一个使用标准CUDA malloc在CUDA中分配的内存数组，它被传递给一个函数，如下所示：

void MyClass::run(uchar4 * input_data)

我也有一个类成员，这是一个推力device_ptr声明为：

thrust::device_ptr<uchar4> data = thrust::device_malloc<uchar4(num_pts);

这里num_pts是数组中值的个数，input_data指针保证为num_pts长。

现在，我想将输入数组复制到thrust_device_ptr中。我看过推力文档，其中有很多是关于从设备到主机内存的复制，反之亦然。我想知道什么是最好的性能最佳的方式来做这个设备设备复制推力或我应该只使用cudaMemcpy？

执行此操作的规范方法是使用thrust::copy 。 thrust::device_ptr具有标准的指针语义，API将无缝地了解源指针和目标指针是否位于主机或设备上，即：

#include <thrust/device_malloc.h>
#include <thrust/device_ptr.h>
#include <thrust/copy.h>
#include <iostream>

int main()
{
    // Initial host data
    int ivals[4] = { 1, 3, 6, 10 };

    // Allocate and copy to first device allocation
    thrust::device_ptr<int> dp1 = thrust::device_malloc<int>(4);
    thrust::copy(&ivals[0], &ivals[0]+4, dp1);

    // Allocate and copy to second device allocation
    thrust::device_ptr<int> dp2 = thrust::device_malloc<int>(4);
    thrust::copy(dp1, dp1+4, dp2);

    // Copy back to host
    int ovals[4] = {-1, -1, -1, -1};
    thrust::copy(dp2, dp2+4, &ovals[0]);

    for(int i=0; i<4; i++)
        std::cout << ovals[i] << std::endl;


    return 0;
}

这是这样做的：

talonmies@box:~$ nvcc -arch=sm_30 thrust_dtod.cu 
talonmies@box:~$ ./a.out 
1
3
6
10

链接地址: http://www.djcxy.com/p/47369.html

上一篇: CUDA thrust: copy from device to device

下一篇: CUDA Copying multiple arrays of structs with cudaMemcpy