CUDA Copying multiple arrays of structs with cudaMemcpy
Suppose a struct X with some primitives and an array of Y structs:
typedef struct
{
int a;
Y** y;
} X;
An instance X1 of X is initialized at the host, and then copied to an instance X2 of X, on the device memory, through cudaMemcpy.
This works fine for all the primitives in X (such as int a), but cudaMemcpy seems to flatten any double pointer into a single pointer, thus causing out of bounds exceptions wherever there's an access to the struct arrays in X (such as y).
In this case am I supposed to use another memcpy function, such as cudaMemcpy2D or cudaMemcpyArrayToArray?
Suggestions are much appreciated. Thanks!
edit
The natural approach (as in "that's what I'd do if it were just C) towards copying an array of structures would be to cudaMalloc the array and then cudaMalloc and initialize each element separately, eg:
X** h_x;
X** d_x;
int num_x;
cudaMalloc((void**)&d_x, sizeof(X)*num_x);
int i=0;
for(;i<num_x;i++)
{
cudaMalloc((void**)d_x[i], sizeof(X));
cudaMemcpy(&d_x[i], &h_x[i], sizeof(X), cudaMemcpyHostToDevice);
}
However, the for's cudaMalloc generates a crash. I confess I'm not yet comfortable with the usage of pointers in Cuda functions, so perhaps I screwed up with the cudaMalloc and cudaMemcpy parameters?
cudaMemcpy
, cudaMemcpy2D
and cudaMemcpyArrayToArray
all copy from a contiguous memory region in the host to a contiguous memory region on the device.
You have to copy all your data in an intermediary contiguous buffer you send to the device.
链接地址: http://www.djcxy.com/p/47368.html上一篇: CUDA推力:从设备到设备复制