gettime() CUDA
I wanted to write an CUDA code where I could see firsthand the benefits that CUDA offered for speeding up applications.
Here is is a CUDA code I have written using Thrust ( http://code.google.com/p/thrust/ )
Briefly, all that the code does is create two 2^23 length integer vectors,one on the host and one on the device identical to each other, and sorts them. It also (attempts to) measure time for each.
On the host vector I use std::sort
. On the device vector I use thrust::sort
.
For compilation I used
nvcc sortcompare.cu -lrt
The output of the program at the terminal is
Desktop: ./a.out
Host Time taken is: 19 . 224622882 seconds
Device Time taken is: 19 . 321644143 seconds
Desktop:
The first std::cout statement is produced after 19.224 seconds as stated. Yet the second std::cout statement (even though it says 19.32 seconds) is produced immediately after the first std::cout statement. Note that I have used different time_stamps for measurements in clock_gettime() viz ts_host and ts_device
I am using Cuda 4.0 and NVIDIA GTX 570 compute capability 2.0
#include<iostream>
#include<vector>
#include<algorithm>
#include<stdlib.h>
//For timings
#include<time.h>
//Necessary thrust headers
#include<thrust/sort.h>
#include<thrust/host_vector.h>
#include<thrust/device_vector.h>
#include<thrust/copy.h>
int main(int argc, char *argv[])
{
int N=23;
thrust::host_vector<int>H(1<<N);//create a vector of 2^N elements on host
thrust::device_vector<int>D(1<<N);//The same on the device.
thrust::host_vector<int>dummy(1<<N);//Copy the D to dummy from GPU after sorting
//Set the host_vector elements.
for (int i = 0; i < H.size(); ++i) {
H[i]=rand();//Set the host vector element to pseudo-random number.
}
//Sort the host_vector. Measure time
// Reset the clock
timespec ts_host;
ts_host.tv_sec = 0;
ts_host.tv_nsec = 0;
clock_settime(CLOCK_PROCESS_CPUTIME_ID, &ts_host);//Start clock
thrust::sort(H.begin(),H.end());
clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &ts_host);//Stop clock
std::cout << "nHost Time taken is: " << ts_host.tv_sec<<" . "<< ts_host.tv_nsec <<" seconds" << std::endl;
D=H; //Set the device vector elements equal to the host_vector
//Sort the device vector. Measure time.
timespec ts_device;
ts_device.tv_sec = 0;
ts_device.tv_nsec = 0;
clock_settime(CLOCK_PROCESS_CPUTIME_ID, &ts_device);//Start clock
thrust::sort(D.begin(),D.end());
thrust::copy(D.begin(),D.end(),dummy.begin());
clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &ts_device);//Stop clock
std::cout << "nDevice Time taken is: " << ts_device.tv_sec<<" . "<< ts_device.tv_nsec <<" seconds" << std::endl;
return 0;
}
You are not checking the return value of clock_settime
. I would guess it is failing, probably with errno
set to EPERM or EINVAL. Read the documentation and always check your return values!
If I'm right, you are not resetting the clock as you think you are, hence the second timing is cumulative with the first, plus some extra stuff you don't intend to count at all.
The right way to do this is to call clock_gettime
only, storing the result first, doing the computation, then subtracting the original time from the end time.
上一篇: osx lion上的unsigned char限制为127?
下一篇: gettime()CUDA