gettime() CUDA

2018-07-01 04:17:24

I wanted to write an CUDA code where I could see firsthand the benefits that CUDA offered for speeding up applications.

Here is is a CUDA code I have written using Thrust ( http://code.google.com/p/thrust/ )

Briefly, all that the code does is create two 2^23 length integer vectors,one on the host and one on the device identical to each other, and sorts them. It also (attempts to) measure time for each.

On the host vector I use std::sort . On the device vector I use thrust::sort .

For compilation I used

nvcc sortcompare.cu -lrt

The output of the program at the terminal is

Desktop: ./a.out

Host Time taken is: 19 . 224622882 seconds

Device Time taken is: 19 . 321644143 seconds

Desktop:

The first std::cout statement is produced after 19.224 seconds as stated. Yet the second std::cout statement (even though it says 19.32 seconds) is produced immediately after the first std::cout statement. Note that I have used different time_stamps for measurements in clock_gettime() viz ts_host and ts_device

I am using Cuda 4.0 and NVIDIA GTX 570 compute capability 2.0

  #include<iostream>
    #include<vector>
    #include<algorithm>
    #include<stdlib.h>

    //For timings
    #include<time.h>
    //Necessary thrust headers
    #include<thrust/sort.h>
    #include<thrust/host_vector.h>
    #include<thrust/device_vector.h>
    #include<thrust/copy.h>


    int main(int argc, char *argv[])
    {
      int N=23;
      thrust::host_vector<int>H(1<<N);//create a vector of 2^N elements on host
      thrust::device_vector<int>D(1<<N);//The same on the device.
      thrust::host_vector<int>dummy(1<<N);//Copy the D to dummy from GPU after sorting 

       //Set the host_vector elements. 
      for (int i = 0; i < H.size(); ++i)    {
          H[i]=rand();//Set the host vector element to pseudo-random number.
        }

      //Sort the host_vector. Measure time
      // Reset the clock
        timespec ts_host;
        ts_host.tv_sec = 0;
        ts_host.tv_nsec = 0;
        clock_settime(CLOCK_PROCESS_CPUTIME_ID, &ts_host);//Start clock

             thrust::sort(H.begin(),H.end());

        clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &ts_host);//Stop clock
        std::cout << "nHost Time taken is: " << ts_host.tv_sec<<" . "<< ts_host.tv_nsec <<" seconds" << std::endl;


        D=H; //Set the device vector elements equal to the host_vector
      //Sort the device vector. Measure time.
        timespec ts_device;
        ts_device.tv_sec = 0;
            ts_device.tv_nsec = 0;
        clock_settime(CLOCK_PROCESS_CPUTIME_ID, &ts_device);//Start clock

             thrust::sort(D.begin(),D.end());
             thrust::copy(D.begin(),D.end(),dummy.begin());


        clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &ts_device);//Stop clock
        std::cout << "nDevice Time taken is: " << ts_device.tv_sec<<" . "<< ts_device.tv_nsec <<" seconds" << std::endl;

      return 0;
    }

You are not checking the return value of clock_settime . I would guess it is failing, probably with errno set to EPERM or EINVAL. Read the documentation and always check your return values!

If I'm right, you are not resetting the clock as you think you are, hence the second timing is cumulative with the first, plus some extra stuff you don't intend to count at all.

The right way to do this is to call clock_gettime only, storing the result first, doing the computation, then subtracting the original time from the end time.

链接地址: http://www.djcxy.com/p/86958.html

上一篇: osx lion上的unsigned char限制为127？

下一篇: gettime（）CUDA