cublasSetVector() vs cudaMemcpy()

I am wondering if there is a difference between:

// cumalloc.c - Create a device on the device
HOST float * cudamath_vector(const float * h_vector, const int m)
{
  float *d_vector = NULL;
  cudaError_t cudaStatus;
  cublasStatus_t cublasStatus;

  cudaStatus = cudaMalloc(&d_vector, sizeof(float) * m );

  if(cudaStatus == cudaErrorMemoryAllocation) {
    printf("ERROR: cumalloc.cu, cudamath_vector() : cudaErrorMemoryAllocation");
    return NULL;
  }


  /*    THIS: */ cublasSetVector(m, sizeof(*d_vector), h_vector, 1, d_vector, 1);

  /* OR THAT: */ cudaMemcpy(d_vector, h_vector, sizeof(float) * m, cudaMemcpyHostToDevice);


  return d_vector;
}

cublasSetVector() has two arguments incx and incy and the documentation says:

The storage spacing between consecutive elements is given by incx for the source vector x and for the destination vector y.

In the NVIDIA forum someone said:

iona_me: "incx and incy are strides measured in floats."

So does this mean that for incx = incy = 1 all elements of a float[] will be sizeof(float) -aligned and for incx = incy = 2 there would be a sizeof(float) -padding between each element?

  • Except for those two parameters and the cublasHandle - does cublasSetVector() anything else what cudaMalloc() doesn't do?
  • Would it be save to pass a vector/matrix which was not created with their respective cublas*() function to other CUBLAS functions to manipulate them?

  • There is a comment in a thread of the NVIDIA Forum provided by Massimiliano Fatica confirming my statement in the above comment (or, saying it better, my comment originated by a recall of having read the post I linked to). In particular

    cublasSetVector , cubblasGetVector , cublasSetMatrix , cublasGetMatrix are thin wrappers around cudaMemcpy and cudaMemcpy2D . Therefore, no significant performance differences are expected between the two sets of copy functions.

    Accordingly, you can safely pass any array created by cudaMalloc as input to cublasSetVector .

    Concerning the strides, perhaps there is a misprint in the guide (as of CUDA 6.0), which says that

    The storage spacing between consecutive elements is given by incx for the source vector x and for the destination vector y .

    but perhaps should be read as

    The storage spacing between consecutive elements is given by incx for the source vector x and incy for the destination vector y .

    链接地址: http://www.djcxy.com/p/79678.html

    上一篇: git中裸共享库的概念

    下一篇: cublasSetVector()vs cudaMemcpy()