Easiest way to test for existence of cuda

We have some nightly build machines that have the cuda libraries installed, but which do not have a cuda-capable GPU installed. These machines are capable of building cuda-enabled programs, but they are not capable of running these programs.

In our automated nightly build process, our cmake scripts use the cmake command

find_package(CUDA)

to determine whether the cuda software is installed. This sets the cmake variable CUDA_FOUND on platforms that have cuda software installed. This is great and it works perfectly. When CUDA_FOUND is set, it is OK to build cuda-enabled programs. Even when the machine has no cuda-capable GPU.

But cuda-using test programs naturally fail on the non-GPU cuda machines, causing our nightly dashboards look "dirty". So I want cmake to avoid running those tests on such machines. But I still want to build the cuda software on those machines.

After getting a positive CUDA_FOUND result, I would like to test for the presence of an actual GPU, and then set a variable, say CUDA_GPU_FOUND , to reflect this.

What is the simplest way to get cmake to test for the presence of a cuda-capable gpu?

This needs to work on three platforms: Windows with MSVC, Mac, and Linux. (That's why we use cmake in the first place)

EDIT: There are a couple of good looking suggestions in the answers for how write a program to test for the presence of a GPU. What is still missing is the means of getting CMake to compile and run this program at configuration time. I suspect that the TRY_RUN command in CMake will be critical here, but unfortunately that command is nearly undocumented, and I cannot figure out how to make it work. This CMake part of the problem might be a much more difficult question. Perhaps I should have asked this as two separate questions...


The answer to this question consists of two parts:

  • A program to detect the presence of a cuda-capable GPU.
  • CMake code to compile, run, and interpret the result of that program at configuration time.
  • For part 1, the gpu sniffing program, I started with the answer provided by fabrizioM because it is so compact. I quickly discovered that I needed many of the details found in unknown's answer to get it to work well. What I ended up with is the following C source file, which I named has_cuda_gpu.c :

    #include <stdio.h>
    #include <cuda_runtime.h>
    
    int main() {
        int deviceCount, device;
        int gpuDeviceCount = 0;
        struct cudaDeviceProp properties;
        cudaError_t cudaResultCode = cudaGetDeviceCount(&deviceCount);
        if (cudaResultCode != cudaSuccess) 
            deviceCount = 0;
        /* machines with no GPUs can still report one emulation device */
        for (device = 0; device < deviceCount; ++device) {
            cudaGetDeviceProperties(&properties, device);
            if (properties.major != 9999) /* 9999 means emulation only */
                ++gpuDeviceCount;
        }
        printf("%d GPU CUDA device(s) foundn", gpuDeviceCount);
    
        /* don't just return the number of gpus, because other runtime cuda
           errors can also yield non-zero return values */
        if (gpuDeviceCount > 0)
            return 0; /* success */
        else
            return 1; /* failure */
    }
    

    Notice that the return code is zero in the case where a cuda-enabled GPU is found. This is because on one of my has-cuda-but-no-GPU machines, this program generates a runtime error with non-zero exit code. So any non-zero exit code is interpreted as "cuda does not work on this machine".

    You might ask why I don't use cuda emulation mode on non-GPU machines. It is because emulation mode is buggy. I only want to debug my code, and work around bugs in cuda GPU code. I don't have time to debug the emulator.

    The second part of the problem is the cmake code to use this test program. After some struggle, I have figured it out. The following block is part of a larger CMakeLists.txt file:

    find_package(CUDA)
    if(CUDA_FOUND)
        try_run(RUN_RESULT_VAR COMPILE_RESULT_VAR
            ${CMAKE_BINARY_DIR} 
            ${CMAKE_CURRENT_SOURCE_DIR}/has_cuda_gpu.c
            CMAKE_FLAGS 
                -DINCLUDE_DIRECTORIES:STRING=${CUDA_TOOLKIT_INCLUDE}
                -DLINK_LIBRARIES:STRING=${CUDA_CUDART_LIBRARY}
            COMPILE_OUTPUT_VARIABLE COMPILE_OUTPUT_VAR
            RUN_OUTPUT_VARIABLE RUN_OUTPUT_VAR)
        message("${RUN_OUTPUT_VAR}") # Display number of GPUs found
        # COMPILE_RESULT_VAR is TRUE when compile succeeds
        # RUN_RESULT_VAR is zero when a GPU is found
        if(COMPILE_RESULT_VAR AND NOT RUN_RESULT_VAR)
            set(CUDA_HAVE_GPU TRUE CACHE BOOL "Whether CUDA-capable GPU is present")
        else()
            set(CUDA_HAVE_GPU FALSE CACHE BOOL "Whether CUDA-capable GPU is present")
        endif()
    endif(CUDA_FOUND)
    

    This sets a CUDA_HAVE_GPU boolean variable in cmake that can subsequently be used to trigger conditional operations.

    It took me a long time to figure out that the include and link parameters need to go in the CMAKE_FLAGS stanza, and what the syntax should be. The try_run documentation is very light, but there is more information in the try_compile documentation, which is a closely related command. I still needed to scour the web for examples of try_compile and try_run before getting this to work.

    Another tricky but important detail is the third argument to try_run , the "bindir". You should probably always set this to ${CMAKE_BINARY_DIR} . In particular, do not set it to ${CMAKE_CURRENT_BINARY_DIR} if you are in a subdirectory of your project. CMake expects to find the subdirectory CMakeFiles/CMakeTmp within bindir, and spews errors if that directory does not exist. Just use ${CMAKE_BINARY_DIR} , which is one location where those subdirectories seem to naturally reside.


    Write a simple program like

    #include<cuda.h>
    
    int main (){
        int deviceCount;
        cudaError_t e = cudaGetDeviceCount(&deviceCount);
        return e == cudaSuccess ? deviceCount : -1;
    }
    

    and check the return value.


    I just wrote a pure Python script that does some of the things you seem to need (I took much of this from the pystream project). It's basically just a wrapper for some functions in the CUDA run time library (it uses ctypes). Look at the main() function to see example usage. Also, be aware that I just wrote it, so it's likely to contain bugs. Use with caution.

    #!/bin/bash
    
    import sys
    import platform
    import ctypes
    
    """
    cudart.py: used to access pars of the CUDA runtime library.
    Most of this code was lifted from the pystream project (it's BSD licensed):
    http://code.google.com/p/pystream
    
    Note that this is likely to only work with CUDA 2.3
    To extend to other versions, you may need to edit the DeviceProp Class
    """
    
    cudaSuccess = 0
    errorDict = {
        1: 'MissingConfigurationError',
        2: 'MemoryAllocationError',
        3: 'InitializationError',
        4: 'LaunchFailureError',
        5: 'PriorLaunchFailureError',
        6: 'LaunchTimeoutError',
        7: 'LaunchOutOfResourcesError',
        8: 'InvalidDeviceFunctionError',
        9: 'InvalidConfigurationError',
        10: 'InvalidDeviceError',
        11: 'InvalidValueError',
        12: 'InvalidPitchValueError',
        13: 'InvalidSymbolError',
        14: 'MapBufferObjectFailedError',
        15: 'UnmapBufferObjectFailedError',
        16: 'InvalidHostPointerError',
        17: 'InvalidDevicePointerError',
        18: 'InvalidTextureError',
        19: 'InvalidTextureBindingError',
        20: 'InvalidChannelDescriptorError',
        21: 'InvalidMemcpyDirectionError',
        22: 'AddressOfConstantError',
        23: 'TextureFetchFailedError',
        24: 'TextureNotBoundError',
        25: 'SynchronizationError',
        26: 'InvalidFilterSettingError',
        27: 'InvalidNormSettingError',
        28: 'MixedDeviceExecutionError',
        29: 'CudartUnloadingError',
        30: 'UnknownError',
        31: 'NotYetImplementedError',
        32: 'MemoryValueTooLargeError',
        33: 'InvalidResourceHandleError',
        34: 'NotReadyError',
        0x7f: 'StartupFailureError',
        10000: 'ApiFailureBaseError'}
    
    
    try:
        if platform.system() == "Microsoft":
            _libcudart = ctypes.windll.LoadLibrary('cudart.dll')
        elif platform.system()=="Darwin":
            _libcudart = ctypes.cdll.LoadLibrary('libcudart.dylib')
        else:
            _libcudart = ctypes.cdll.LoadLibrary('libcudart.so')
        _libcudart_error = None
    except OSError, e:
        _libcudart_error = e
        _libcudart = None
    
    def _checkCudaStatus(status):
        if status != cudaSuccess:
            eClassString = errorDict[status]
            # Get the class by name from the top level of this module
            eClass = globals()[eClassString]
            raise eClass()
    
    def _checkDeviceNumber(device):
        assert isinstance(device, int), "device number must be an int"
        assert device >= 0, "device number must be greater than 0"
        assert device < 2**8-1, "device number must be < 255"
    
    
    # cudaDeviceProp
    class DeviceProp(ctypes.Structure):
        _fields_ = [
             ("name", 256*ctypes.c_char), #  < ASCII string identifying device
             ("totalGlobalMem", ctypes.c_size_t), #  < Global memory available on device in bytes
             ("sharedMemPerBlock", ctypes.c_size_t), #  < Shared memory available per block in bytes
             ("regsPerBlock", ctypes.c_int), #  < 32-bit registers available per block
             ("warpSize", ctypes.c_int), #  < Warp size in threads
             ("memPitch", ctypes.c_size_t), #  < Maximum pitch in bytes allowed by memory copies
             ("maxThreadsPerBlock", ctypes.c_int), #  < Maximum number of threads per block
             ("maxThreadsDim", 3*ctypes.c_int), #  < Maximum size of each dimension of a block
             ("maxGridSize", 3*ctypes.c_int), #  < Maximum size of each dimension of a grid
             ("clockRate", ctypes.c_int), #  < Clock frequency in kilohertz
             ("totalConstMem", ctypes.c_size_t), #  < Constant memory available on device in bytes
             ("major", ctypes.c_int), #  < Major compute capability
             ("minor", ctypes.c_int), #  < Minor compute capability
             ("textureAlignment", ctypes.c_size_t), #  < Alignment requirement for textures
             ("deviceOverlap", ctypes.c_int), #  < Device can concurrently copy memory and execute a kernel
             ("multiProcessorCount", ctypes.c_int), #  < Number of multiprocessors on device
             ("kernelExecTimeoutEnabled", ctypes.c_int), #  < Specified whether there is a run time limit on kernels
             ("integrated", ctypes.c_int), #  < Device is integrated as opposed to discrete
             ("canMapHostMemory", ctypes.c_int), #  < Device can map host memory with cudaHostAlloc/cudaHostGetDevicePointer
             ("computeMode", ctypes.c_int), #  < Compute mode (See ::cudaComputeMode)
             ("__cudaReserved", 36*ctypes.c_int),
    ]
    
        def __str__(self):
            return """NVidia GPU Specifications:
        Name: %s
        Total global mem: %i
        Shared mem per block: %i
        Registers per block: %i
        Warp size: %i
        Mem pitch: %i
        Max threads per block: %i
        Max treads dim: (%i, %i, %i)
        Max grid size: (%i, %i, %i)
        Total const mem: %i
        Compute capability: %i.%i
        Clock Rate (GHz): %f
        Texture alignment: %i
    """ % (self.name, self.totalGlobalMem, self.sharedMemPerBlock,
           self.regsPerBlock, self.warpSize, self.memPitch,
           self.maxThreadsPerBlock,
           self.maxThreadsDim[0], self.maxThreadsDim[1], self.maxThreadsDim[2],
           self.maxGridSize[0], self.maxGridSize[1], self.maxGridSize[2],
           self.totalConstMem, self.major, self.minor,
           float(self.clockRate)/1.0e6, self.textureAlignment)
    
    def cudaGetDeviceCount():
        if _libcudart is None: return  0
        deviceCount = ctypes.c_int()
        status = _libcudart.cudaGetDeviceCount(ctypes.byref(deviceCount))
        _checkCudaStatus(status)
        return deviceCount.value
    
    def getDeviceProperties(device):
        if _libcudart is None: return  None
        _checkDeviceNumber(device)
        props = DeviceProp()
        status = _libcudart.cudaGetDeviceProperties(ctypes.byref(props), device)
        _checkCudaStatus(status)
        return props
    
    def getDriverVersion():
        if _libcudart is None: return  None
        version = ctypes.c_int()
        _libcudart.cudaDriverGetVersion(ctypes.byref(version))
        v = "%d.%d" % (version.value//1000,
                       version.value%100)
        return v
    
    def getRuntimeVersion():
        if _libcudart is None: return  None
        version = ctypes.c_int()
        _libcudart.cudaRuntimeGetVersion(ctypes.byref(version))
        v = "%d.%d" % (version.value//1000,
                       version.value%100)
        return v
    
    def getGpuCount():
        count=0
        for ii in range(cudaGetDeviceCount()):
            props = getDeviceProperties(ii)
            if props.major!=9999: count+=1
        return count
    
    def getLoadError():
        return _libcudart_error
    
    
    version = getDriverVersion()
    if version is not None and not version.startswith('2.3'):
        sys.stdout.write("WARNING: Driver version %s may not work with %sn" %
                         (version, sys.argv[0]))
    
    version = getRuntimeVersion()
    if version is not None and not version.startswith('2.3'):
        sys.stdout.write("WARNING: Runtime version %s may not work with %sn" %
                         (version, sys.argv[0]))
    
    
    def main():
    
        sys.stdout.write("Driver version: %sn" % getDriverVersion())
        sys.stdout.write("Runtime version: %sn" % getRuntimeVersion())
    
        nn = cudaGetDeviceCount()
        sys.stdout.write("Device count: %sn" % nn)
    
        for ii in range(nn):
            props = getDeviceProperties(ii)
            sys.stdout.write("nDevice %d:n" % ii)
            #sys.stdout.write("%s" % props)
            for f_name, f_type in props._fields_:
                attr = props.__getattribute__(f_name)
                sys.stdout.write( "  %s: %sn" % (f_name, attr))
    
        gpuCount = getGpuCount()
        if gpuCount > 0:
            sys.stdout.write("n")
        sys.stdout.write("GPU count: %dn" % getGpuCount())
        e = getLoadError()
        if e is not None:
            sys.stdout.write("There was an error loading a library:n%snn" % e)
    
    if __name__=="__main__":
        main()
    
    链接地址: http://www.djcxy.com/p/25386.html

    上一篇: 为实时图像处理提供建议

    下一篇: 最简单的方法来测试cuda的存在