最简单的方法来测试cuda的存在

2018-06-08 09:10:06

我们有一些夜间制作机器，它们安装了cuda库，但没有安装支持cuda的GPU。这些机器能够构建启用cuda的程序，但它们无法运行这些程序。

在我们的夜间自动生成过程中，我们的cmake脚本使用cmake命令

find_package(CUDA)

以确定是否安装了cuda软件。这CUDA_FOUND在安装了cuda软件的平台上设置cmake变量CUDA_FOUND 。这是伟大的，它完美的作品。当设置CUDA_FOUND ，可以创建启用cuda的程序。即使机器没有支持cuda的GPU。

但是使用cuda的测试程序在非GPU cuda机器上自然会失败，导致我们每晚的仪表板看起来“脏”。所以我希望cmake避免在这些机器上运行这些测试。但我仍然希望在这些机器上构建cuda软件。

获得积极的CUDA_FOUND结果后，我想测试一下实际的GPU，然后设置一个变量，比如说CUDA_GPU_FOUND ，以反映这一点。

让cmake测试存在cuda功能的gpu的最简单方法是什么？

这需要在三个平台上运行：Windows与MSVC，Mac和Linux。（这就是为什么我们首先使用cmake）

编辑：在如何编写一个程序来测试GPU的存在的答案中有几个好看的建议。仍然缺少的是让CMake在配置时编译和运行该程序的方法。我怀疑CMake中的TRY_RUN命令在这里很关键，但不幸的是，这个命令几乎没有文档，我不知道如何使它工作。这个CMake问题的一部分可能是一个更难的问题。也许我应该问这是两个不同的问题......

这个问题的答案由两部分组成：

一个检测存在具有cuda功能的GPU的程序。

CMake代码在配置时编译，运行和解释该程序的结果。

对于第1部分，gpu嗅探程序，我从fabrizioM提供的答案开始，因为它非常紧凑。我很快发现，我需要很多未知的答案中找到的细节才能让它运作良好。我最终得到的是以下C源文件，我将其命名为has_cuda_gpu.c ：

#include <stdio.h>
#include <cuda_runtime.h>

int main() {
    int deviceCount, device;
    int gpuDeviceCount = 0;
    struct cudaDeviceProp properties;
    cudaError_t cudaResultCode = cudaGetDeviceCount(&deviceCount);
    if (cudaResultCode != cudaSuccess) 
        deviceCount = 0;
    /* machines with no GPUs can still report one emulation device */
    for (device = 0; device < deviceCount; ++device) {
        cudaGetDeviceProperties(&properties, device);
        if (properties.major != 9999) /* 9999 means emulation only */
            ++gpuDeviceCount;
    }
    printf("%d GPU CUDA device(s) foundn", gpuDeviceCount);

    /* don't just return the number of gpus, because other runtime cuda
       errors can also yield non-zero return values */
    if (gpuDeviceCount > 0)
        return 0; /* success */
    else
        return 1; /* failure */
}

请注意，在找到启用cuda的GPU的情况下，返回代码为零。这是因为在我的一台有 - 无GPU的机器上，该程序会产生一个带有非零退出代码的运行时错误。因此，任何非零退出代码都被解释为“cuda无法在此机器上工作”。

您可能会问，为什么我不在非GPU机器上使用cuda仿真模式。这是因为仿真模式是越野车。我只想调试我的代码，并解决cuda GPU代码中的错误。我没有时间去调试模拟器。

问题的第二部分是使用此测试程序的cmake代码。经过一番斗争，我发现了。以下块是较大的CMakeLists.txt文件的一部分：

find_package(CUDA)
if(CUDA_FOUND)
    try_run(RUN_RESULT_VAR COMPILE_RESULT_VAR
        ${CMAKE_BINARY_DIR} 
        ${CMAKE_CURRENT_SOURCE_DIR}/has_cuda_gpu.c
        CMAKE_FLAGS 
            -DINCLUDE_DIRECTORIES:STRING=${CUDA_TOOLKIT_INCLUDE}
            -DLINK_LIBRARIES:STRING=${CUDA_CUDART_LIBRARY}
        COMPILE_OUTPUT_VARIABLE COMPILE_OUTPUT_VAR
        RUN_OUTPUT_VARIABLE RUN_OUTPUT_VAR)
    message("${RUN_OUTPUT_VAR}") # Display number of GPUs found
    # COMPILE_RESULT_VAR is TRUE when compile succeeds
    # RUN_RESULT_VAR is zero when a GPU is found
    if(COMPILE_RESULT_VAR AND NOT RUN_RESULT_VAR)
        set(CUDA_HAVE_GPU TRUE CACHE BOOL "Whether CUDA-capable GPU is present")
    else()
        set(CUDA_HAVE_GPU FALSE CACHE BOOL "Whether CUDA-capable GPU is present")
    endif()
endif(CUDA_FOUND)

这会在cmake中设置一个CUDA_HAVE_GPU布尔变量，随后可用于触发条件操作。

我花了很长时间才弄清楚CMAKE_FLAGS节中包含和链接参数需要使用，以及语法应该是什么。 try_run文档很轻，但try_compile文档中提供了更多信息，这是一个紧密相关的命令。在开始工作之前，我仍然需要在网上搜索try_compile和try_run的例子。

另一个棘手但重要的细节是try_run的第三个参数，即“bindir”。您应该始终将其设置为${CMAKE_BINARY_DIR} 。特别是，如果您位于项目的子目录中，请不要将其设置为${CMAKE_CURRENT_BINARY_DIR} 。 CMake期望在bindir中找到子目录CMakeFiles/CMakeTmp ，并且如果该目录不存在则发布错误。只需使用${CMAKE_BINARY_DIR} ，这是这些子目录自然存在的位置。

写一个简单的程序就好

#include<cuda.h>

int main (){
    int deviceCount;
    cudaError_t e = cudaGetDeviceCount(&deviceCount);
    return e == cudaSuccess ? deviceCount : -1;
}

并检查返回值。

我只写了一个纯Python脚本，它可以完成您似乎需要的一些事情（我从pystream项目中获取了大部分内容）。它基本上只是CUDA运行时库（它使用ctypes）中的一些函数的包装。查看main（）函数以查看示例用法。另外，请注意，我只是写了它，所以它可能包含错误。谨慎使用。

#!/bin/bash

import sys
import platform
import ctypes

"""
cudart.py: used to access pars of the CUDA runtime library.
Most of this code was lifted from the pystream project (it's BSD licensed):
http://code.google.com/p/pystream

Note that this is likely to only work with CUDA 2.3
To extend to other versions, you may need to edit the DeviceProp Class
"""

cudaSuccess = 0
errorDict = {
    1: 'MissingConfigurationError',
    2: 'MemoryAllocationError',
    3: 'InitializationError',
    4: 'LaunchFailureError',
    5: 'PriorLaunchFailureError',
    6: 'LaunchTimeoutError',
    7: 'LaunchOutOfResourcesError',
    8: 'InvalidDeviceFunctionError',
    9: 'InvalidConfigurationError',
    10: 'InvalidDeviceError',
    11: 'InvalidValueError',
    12: 'InvalidPitchValueError',
    13: 'InvalidSymbolError',
    14: 'MapBufferObjectFailedError',
    15: 'UnmapBufferObjectFailedError',
    16: 'InvalidHostPointerError',
    17: 'InvalidDevicePointerError',
    18: 'InvalidTextureError',
    19: 'InvalidTextureBindingError',
    20: 'InvalidChannelDescriptorError',
    21: 'InvalidMemcpyDirectionError',
    22: 'AddressOfConstantError',
    23: 'TextureFetchFailedError',
    24: 'TextureNotBoundError',
    25: 'SynchronizationError',
    26: 'InvalidFilterSettingError',
    27: 'InvalidNormSettingError',
    28: 'MixedDeviceExecutionError',
    29: 'CudartUnloadingError',
    30: 'UnknownError',
    31: 'NotYetImplementedError',
    32: 'MemoryValueTooLargeError',
    33: 'InvalidResourceHandleError',
    34: 'NotReadyError',
    0x7f: 'StartupFailureError',
    10000: 'ApiFailureBaseError'}


try:
    if platform.system() == "Microsoft":
        _libcudart = ctypes.windll.LoadLibrary('cudart.dll')
    elif platform.system()=="Darwin":
        _libcudart = ctypes.cdll.LoadLibrary('libcudart.dylib')
    else:
        _libcudart = ctypes.cdll.LoadLibrary('libcudart.so')
    _libcudart_error = None
except OSError, e:
    _libcudart_error = e
    _libcudart = None

def _checkCudaStatus(status):
    if status != cudaSuccess:
        eClassString = errorDict[status]
        # Get the class by name from the top level of this module
        eClass = globals()[eClassString]
        raise eClass()

def _checkDeviceNumber(device):
    assert isinstance(device, int), "device number must be an int"
    assert device >= 0, "device number must be greater than 0"
    assert device < 2**8-1, "device number must be < 255"


# cudaDeviceProp
class DeviceProp(ctypes.Structure):
    _fields_ = [
         ("name", 256*ctypes.c_char), #  < ASCII string identifying device
         ("totalGlobalMem", ctypes.c_size_t), #  < Global memory available on device in bytes
         ("sharedMemPerBlock", ctypes.c_size_t), #  < Shared memory available per block in bytes
         ("regsPerBlock", ctypes.c_int), #  < 32-bit registers available per block
         ("warpSize", ctypes.c_int), #  < Warp size in threads
         ("memPitch", ctypes.c_size_t), #  < Maximum pitch in bytes allowed by memory copies
         ("maxThreadsPerBlock", ctypes.c_int), #  < Maximum number of threads per block
         ("maxThreadsDim", 3*ctypes.c_int), #  < Maximum size of each dimension of a block
         ("maxGridSize", 3*ctypes.c_int), #  < Maximum size of each dimension of a grid
         ("clockRate", ctypes.c_int), #  < Clock frequency in kilohertz
         ("totalConstMem", ctypes.c_size_t), #  < Constant memory available on device in bytes
         ("major", ctypes.c_int), #  < Major compute capability
         ("minor", ctypes.c_int), #  < Minor compute capability
         ("textureAlignment", ctypes.c_size_t), #  < Alignment requirement for textures
         ("deviceOverlap", ctypes.c_int), #  < Device can concurrently copy memory and execute a kernel
         ("multiProcessorCount", ctypes.c_int), #  < Number of multiprocessors on device
         ("kernelExecTimeoutEnabled", ctypes.c_int), #  < Specified whether there is a run time limit on kernels
         ("integrated", ctypes.c_int), #  < Device is integrated as opposed to discrete
         ("canMapHostMemory", ctypes.c_int), #  < Device can map host memory with cudaHostAlloc/cudaHostGetDevicePointer
         ("computeMode", ctypes.c_int), #  < Compute mode (See ::cudaComputeMode)
         ("__cudaReserved", 36*ctypes.c_int),
]

    def __str__(self):
        return """NVidia GPU Specifications:
    Name: %s
    Total global mem: %i
    Shared mem per block: %i
    Registers per block: %i
    Warp size: %i
    Mem pitch: %i
    Max threads per block: %i
    Max treads dim: (%i, %i, %i)
    Max grid size: (%i, %i, %i)
    Total const mem: %i
    Compute capability: %i.%i
    Clock Rate (GHz): %f
    Texture alignment: %i
""" % (self.name, self.totalGlobalMem, self.sharedMemPerBlock,
       self.regsPerBlock, self.warpSize, self.memPitch,
       self.maxThreadsPerBlock,
       self.maxThreadsDim[0], self.maxThreadsDim[1], self.maxThreadsDim[2],
       self.maxGridSize[0], self.maxGridSize[1], self.maxGridSize[2],
       self.totalConstMem, self.major, self.minor,
       float(self.clockRate)/1.0e6, self.textureAlignment)

def cudaGetDeviceCount():
    if _libcudart is None: return  0
    deviceCount = ctypes.c_int()
    status = _libcudart.cudaGetDeviceCount(ctypes.byref(deviceCount))
    _checkCudaStatus(status)
    return deviceCount.value

def getDeviceProperties(device):
    if _libcudart is None: return  None
    _checkDeviceNumber(device)
    props = DeviceProp()
    status = _libcudart.cudaGetDeviceProperties(ctypes.byref(props), device)
    _checkCudaStatus(status)
    return props

def getDriverVersion():
    if _libcudart is None: return  None
    version = ctypes.c_int()
    _libcudart.cudaDriverGetVersion(ctypes.byref(version))
    v = "%d.%d" % (version.value//1000,
                   version.value%100)
    return v

def getRuntimeVersion():
    if _libcudart is None: return  None
    version = ctypes.c_int()
    _libcudart.cudaRuntimeGetVersion(ctypes.byref(version))
    v = "%d.%d" % (version.value//1000,
                   version.value%100)
    return v

def getGpuCount():
    count=0
    for ii in range(cudaGetDeviceCount()):
        props = getDeviceProperties(ii)
        if props.major!=9999: count+=1
    return count

def getLoadError():
    return _libcudart_error


version = getDriverVersion()
if version is not None and not version.startswith('2.3'):
    sys.stdout.write("WARNING: Driver version %s may not work with %sn" %
                     (version, sys.argv[0]))

version = getRuntimeVersion()
if version is not None and not version.startswith('2.3'):
    sys.stdout.write("WARNING: Runtime version %s may not work with %sn" %
                     (version, sys.argv[0]))


def main():

    sys.stdout.write("Driver version: %sn" % getDriverVersion())
    sys.stdout.write("Runtime version: %sn" % getRuntimeVersion())

    nn = cudaGetDeviceCount()
    sys.stdout.write("Device count: %sn" % nn)

    for ii in range(nn):
        props = getDeviceProperties(ii)
        sys.stdout.write("nDevice %d:n" % ii)
        #sys.stdout.write("%s" % props)
        for f_name, f_type in props._fields_:
            attr = props.__getattribute__(f_name)
            sys.stdout.write( "  %s: %sn" % (f_name, attr))

    gpuCount = getGpuCount()
    if gpuCount > 0:
        sys.stdout.write("n")
    sys.stdout.write("GPU count: %dn" % getGpuCount())
    e = getLoadError()
    if e is not None:
        sys.stdout.write("There was an error loading a library:n%snn" % e)

if __name__=="__main__":
    main()

链接地址: http://www.djcxy.com/p/25385.html

上一篇: Easiest way to test for existence of cuda

下一篇: Access RelatedManager for a model created by multi