How to catch or handle CUDA kernel launch errors

Question

I use the checkCudaErrors helper function from the CUDA Toolkit Samples. See "helper_cuda.h". I am perplexed as to why the launch error from this example is not caught by checkCudaErrors. The error is too many threads are launched (2048).

From Debug (linux gdb), the console prints (stderr in red) "warning: Cuda API error detected: cudaLaunch returned (0x9)".

Whereas when I execute either the Release or Debug builds from a Bash shell, no error is printed by checkCudaErrors.

Why is this?

My expectation is the error would be caught and printed at the D2H memcpy call immediately proceeding the launch. Is this incorrect?

Minimal reproducible example:

#include <cuda.h>
#include "helper_cuda.h"

__global__ void BusyIncrementKernel( const size_t increments, float * result){
    float tmp = 0;
    for ( size_t i = 0; i < increments; ++i ){ tmp += 1; }
    const int j = threadIdx.x + blockIdx.x*blockDim.x;
    if ( j == 0 ){ *result = tmp; }
}

int main( int argc, char * argv[] ){
    unsigned int blockDim = 2048;
    dim3 block{ blockDim, 1, 1};
    dim3 grid{ 1, 1, 1};
    float * dResult;
    checkCudaErrors( cudaMalloc( &dResult, sizeof(float) ));
    BusyIncrementKernel<<< grid, block >>>( 10000000, dResult );
    float result;
    checkCudaErrors( cudaMemcpy( &result, dResult, sizeof(float), cudaMemcpyDeviceToHost ));
    checkCudaErrors( cudaFree( dResult ));
    checkCudaErrors( cudaDeviceSynchronize() );
    fprintf( stderr,"result: %f\n", result );
    return 0;
}

Where exactly does it fail? I am almost sure you can't get an error with checkCudaError() for Kernel errors,but only client API calls. — Michael IV
– Michael IV, Commented Mar 26, 2018 at 10:05
@MichaelIV the program always returns after the fprintf( result: ....) in main(). I expect it to return from within checkCudaErrors( ... exit(EXIT_FAILURE) ), but it does not. I also expect Kernel errors are eventually returned by later CUDA API calls, e.g. the proceeding cudaMemcpy, cudaFree, or definitely the cudaDeviceSynchronize. — Tyson Hilmer
– Tyson Hilmer, Commented Mar 26, 2018 at 10:10
So you're saying checkCudaErrors doesn't return any errors at all? — Michael IV
– Michael IV, Commented Mar 26, 2018 at 10:12
@MichaelIV. Yes. Which makes no sense. It has worked well for me in the past. Clean rebuild and build call all look normal. — Tyson Hilmer
– Tyson Hilmer, Commented Mar 26, 2018 at 10:13
Maybe you're running in release build mode? Do regular asserts work? — Michael IV
– Michael IV, Commented Mar 26, 2018 at 10:14

Tyson Hilmer · Accepted Answer · 2018-03-26 14:47:54Z

3

This answer by talonmies specifically states kernel launches require a slightly different pattern to handle. The CUDA API documentation 3.2.9. on Error Checking explains this.

This answer by Robert Crovella indicates there are two error types, that differ in how the API reports (returns) them*.

My results are; the only way to catch kernel launch errors is with cudaPeekAtLastError() or cudaGetLastError() after the launch call. These are the only API functions that returned the launch error code. Other subsequent API calls did not return the launch error code, nor did they clear it; it could be obtained later by cudaPeekAtLastError or cudaGetLastError.

edited Mar 26, 2018 at 14:47

answered Mar 26, 2018 at 11:58

Tyson Hilmer

8138 silver badges29 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Robert Crovella Over a year ago

This particular case has nothing to do with asynchrony, or waiting long enough. The first paragraph of your answer is off-base. And this question is basically a duplicate of the one talonmies provided the link to.

Tyson Hilmer Over a year ago

@RobertCrovella I thought so, but don't understand why. Can you please elaborate? Or simply edit for correctness.

Tyson Hilmer Over a year ago

@RobertCrovella Talonmies post makes sense. But I cannot agree with the doc stating "...; if an asynchronous error occurs, it will be reported by some subsequent unrelated runtime function call." That implies any API call, whereas that is not what I am getting.

Robert Crovella Over a year ago

The error you have here is not an asynchronous error. An asynchronous error occurs during execution of kernel code, such as an out-of-bounds access, etc. The error you have here is an invalid launch configuration. This is caught at launch time, and does not occur asynchronously some time later as the kernel is actually executing device code. There are basically two types of errors.

Tyson Hilmer Over a year ago

@RobertCrovella. Thank you kindly :) The emphasis that there are two types of errors is exactly what I was having trouble understanding. Hopefully I have edited the answer to be correct.

einpoklum · Accepted Answer · 2018-03-26 14:57:02Z

0

CUDA kernel launches do not return an error code for the launch. To catch the error, you need to perform some explicit error checking after the launch, and before any additional API calls:

checkCudaErrors( cudaPeekAtLastError() );
checkCudaErrors( cudaDeviceSynchronize() );

The first call should catch at least any launch errors, and by the second call errors during kernel execution will have been caught as well (see also this answer). Since you haven't done this, you don't see the error until the next API call at the earliest.

edited Mar 26, 2018 at 14:57

answered Mar 26, 2018 at 14:26

einpoklum

137k86 gold badges445 silver badges917 bronze badges

6 Comments

Tyson Hilmer Over a year ago

I tried the cudaPeekAtLastError() call immediately after the kernel launch, and after the three subsequent API calls (memcpy,free,device sync). It returned an error code in both cases, whereas the other API calls did not. Which implies cudaPeekAtLastError() does not need to be called immediately, i.e. subsequent API calls do not overwrite the last error. I would sure appreciate confirmation or rejection of this.

Robert Crovella Over a year ago

actually, this type of error manifests immediately. Unlike many type of kernel errors, this one is synchronous. It is available to be inspected as soon as the underlying API call (cudaLaunch) is complete. And no, this error is not "lost". There is no "overwriting".

einpoklum Over a year ago

@RobertCrovella: 1. About "immediately" - I meant to say in the instruction used for the kernel launch; edited. 2. I thought OP said his program exits after the last fprintf() - which means that no API error was caught until program exit.

einpoklum Over a year ago

@TysonHilmer: Didn't you write in a comment on your question that your program executes until after the last fprintf()? If that's the case, doesn't that mean that no error code is issued?

Tyson Hilmer Over a year ago

@einpoklum Yes. $ Release/TestLaunchError; echo $? result: 0.000000 0 It returns 0 and no error code is issued. That is for the source as currently given in the question, i.e. lacking cudaPeekAtLastError.

|

Collectives™ on Stack Overflow

How to catch or handle CUDA kernel launch errors

2 Answers 2

5 Comments

6 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

5 Comments

6 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related