4

I've built a program using Hybridizer to write CUDA code in C# and call the functions. The program is functional but I noticed that the overhead of setting up the GPU and calling the function to it is extremely high. For example, a job which took 3000 ticks when run on the CPU took about 50 million ticks to set up the GPU wrapper then another 50 million ticks to run when doing it on the GPU. I'm trying to figure out if this lag is due to Hybridizer itself or is simply unavoidable when calling GPU code from my C# program.

So I'm looking for alternative methods. My searches have found some mentions of something called P/invoke, but I can't really find a good guide on how to use it and all of those threads are 9+ years old so I don't know if their information is still relevant. I also found something about ManagedCuda but it seems that is no longer in development.

1 Answer 1

6

You can try CppSharp to generate C# bindings to CUDA. We were able to initialize CUDA with this approach and call it's simple hardware info functions (GetDeviceProperties, CudaSetDevice, CudaGetDeviceCount, CudaDriverGetVersion, CudaRuntimeGetVersion).

Usage of the other parts of CUDA API seems to be possible but we did not try: CppSharp generated bindings for the whole CUDA runtime API. We use CUDA indirectly via NVIDIA's Flex library. All the Flex functions are usable via CppSharp without considerable penalties.

The example usage of classes generated via CppSharp looks like this:

int driverVersion = 0;
CudaRuntimeApi.CudaDriverGetVersion(ref driverVersion);

int runtimeVersion = 0;
CudaRuntimeApi.CudaRuntimeGetVersion(ref runtimeVersion);

int deviceCount = 0;
var errorCode = CudaRuntimeApi.CudaGetDeviceCount(ref deviceCount);

if (errorCode != CudaError.CudaSuccess)
{
    Console.Error.WriteLine("'cudaGetDeviceCount' returned " + errorCode + ": " + CudaRuntimeApi.CudaGetErrorString(errorCode));
    return;
}

for (var device = 0; device < deviceCount; ++device)
{
    using (var deviceProperties = new CudaDeviceProp()) 
    {
        CudaRuntimeApi.CudaGetDeviceProperties(deviceProperties, device);
    }
}
         

CudaRuntimeApi and CudaDeviceProp are the classes generated by CppSharp.

Sign up to request clarification or add additional context in comments.

6 Comments

why not provide an example?
@Robert Crovella, CppSharp is a tool for bindings generation. Are you asking for scripts invoking it? Or examples of generated code? The code initializing CUDA is textually the same as in C++.
The C# part. You could simply demonstrate how to run a sample code like deviceQuery from C#. The CUDA code used as an example isn't that important, but it would be nice to see something complete, that works. I provide lots of fully worked examples in my answers, even ones that include things like OpenMP and calling CUDA code from python. Nobody charges you by the word or character to post here, so extreme brevity isn't really an attractive feature in an SO answer, in my opinion. Here is an example of calling CUDA from python using ctypes.
@Robert Crovella, oh, I see. I'll try to post the code.
On SO, I think it's pretty universally agreed that we like code.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.