2

I use the checkCudaErrors helper function from the CUDA Toolkit Samples. See "helper_cuda.h". I am perplexed as to why the launch error from this example is not caught by checkCudaErrors. The error is too many threads are launched (2048).

From Debug (linux gdb), the console prints (stderr in red) "warning: Cuda API error detected: cudaLaunch returned (0x9)".

Whereas when I execute either the Release or Debug builds from a Bash shell, no error is printed by checkCudaErrors.

Why is this?

My expectation is the error would be caught and printed at the D2H memcpy call immediately proceeding the launch. Is this incorrect?

Minimal reproducible example:

#include <cuda.h>
#include "helper_cuda.h"

__global__ void BusyIncrementKernel( const size_t increments, float * result){
    float tmp = 0;
    for ( size_t i = 0; i < increments; ++i ){ tmp += 1; }
    const int j = threadIdx.x + blockIdx.x*blockDim.x;
    if ( j == 0 ){ *result = tmp; }
}

int main( int argc, char * argv[] ){
    unsigned int blockDim = 2048;
    dim3 block{ blockDim, 1, 1};
    dim3 grid{ 1, 1, 1};
    float * dResult;
    checkCudaErrors( cudaMalloc( &dResult, sizeof(float) ));
    BusyIncrementKernel<<< grid, block >>>( 10000000, dResult );
    float result;
    checkCudaErrors( cudaMemcpy( &result, dResult, sizeof(float), cudaMemcpyDeviceToHost ));
    checkCudaErrors( cudaFree( dResult ));
    checkCudaErrors( cudaDeviceSynchronize() );
    fprintf( stderr,"result: %f\n", result );
    return 0;
}
8
  • Where exactly does it fail? I am almost sure you can't get an error with checkCudaError() for Kernel errors,but only client API calls. Commented Mar 26, 2018 at 10:05
  • @MichaelIV the program always returns after the fprintf( result: ....) in main(). I expect it to return from within checkCudaErrors( ... exit(EXIT_FAILURE) ), but it does not. I also expect Kernel errors are eventually returned by later CUDA API calls, e.g. the proceeding cudaMemcpy, cudaFree, or definitely the cudaDeviceSynchronize. Commented Mar 26, 2018 at 10:10
  • So you're saying checkCudaErrors doesn't return any errors at all? Commented Mar 26, 2018 at 10:12
  • @MichaelIV. Yes. Which makes no sense. It has worked well for me in the past. Clean rebuild and build call all look normal. Commented Mar 26, 2018 at 10:13
  • Maybe you're running in release build mode? Do regular asserts work? Commented Mar 26, 2018 at 10:14

2 Answers 2

3

This answer by talonmies specifically states kernel launches require a slightly different pattern to handle. The CUDA API documentation 3.2.9. on Error Checking explains this.

This answer by Robert Crovella indicates there are two error types, that differ in how the API reports (returns) them*.

My results are; the only way to catch kernel launch errors is with cudaPeekAtLastError() or cudaGetLastError() after the launch call. These are the only API functions that returned the launch error code. Other subsequent API calls did not return the launch error code, nor did they clear it; it could be obtained later by cudaPeekAtLastError or cudaGetLastError.

Sign up to request clarification or add additional context in comments.

5 Comments

This particular case has nothing to do with asynchrony, or waiting long enough. The first paragraph of your answer is off-base. And this question is basically a duplicate of the one talonmies provided the link to.
@RobertCrovella I thought so, but don't understand why. Can you please elaborate? Or simply edit for correctness.
@RobertCrovella Talonmies post makes sense. But I cannot agree with the doc stating "...; if an asynchronous error occurs, it will be reported by some subsequent unrelated runtime function call." That implies any API call, whereas that is not what I am getting.
The error you have here is not an asynchronous error. An asynchronous error occurs during execution of kernel code, such as an out-of-bounds access, etc. The error you have here is an invalid launch configuration. This is caught at launch time, and does not occur asynchronously some time later as the kernel is actually executing device code. There are basically two types of errors.
@RobertCrovella. Thank you kindly :) The emphasis that there are two types of errors is exactly what I was having trouble understanding. Hopefully I have edited the answer to be correct.
0

CUDA kernel launches do not return an error code for the launch. To catch the error, you need to perform some explicit error checking after the launch, and before any additional API calls:

checkCudaErrors( cudaPeekAtLastError() );
checkCudaErrors( cudaDeviceSynchronize() );

The first call should catch at least any launch errors, and by the second call errors during kernel execution will have been caught as well (see also this answer). Since you haven't done this, you don't see the error until the next API call at the earliest.

6 Comments

I tried the cudaPeekAtLastError() call immediately after the kernel launch, and after the three subsequent API calls (memcpy,free,device sync). It returned an error code in both cases, whereas the other API calls did not. Which implies cudaPeekAtLastError() does not need to be called immediately, i.e. subsequent API calls do not overwrite the last error. I would sure appreciate confirmation or rejection of this.
actually, this type of error manifests immediately. Unlike many type of kernel errors, this one is synchronous. It is available to be inspected as soon as the underlying API call (cudaLaunch) is complete. And no, this error is not "lost". There is no "overwriting".
@RobertCrovella: 1. About "immediately" - I meant to say in the instruction used for the kernel launch; edited. 2. I thought OP said his program exits after the last fprintf() - which means that no API error was caught until program exit.
@TysonHilmer: Didn't you write in a comment on your question that your program executes until after the last fprintf()? If that's the case, doesn't that mean that no error code is issued?
@einpoklum Yes. $ Release/TestLaunchError; echo $? result: 0.000000 0 It returns 0 and no error code is issued. That is for the source as currently given in the question, i.e. lacking cudaPeekAtLastError.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.