Triggering a runtime error within a CUDA kernel

Question

In CUDA, we can't throw exceptions; but - we can and do occasionally reach exceptional situations in which we can't proceed, and on the host we would have thrown an exception.

So, as a second best, we can at least trigger a runtime error to stop doing unreasonable work and indicate that something went wrong.

What's a good way to do so in a CUDA kernel, which:

Doesn't cause undefined behavior
Will stop kernel execution once reached
Will not trigger a compiler warning/error

?

2 revs, 2 users 55% · Accepted Answer · 2018-06-10 13:55:46Z

10

Option 1 Assertions:

All presently supported GPUs include an in kernel assertion mechanism, described here.

Directly from the documentation:

#include <assert.h>

__global__ void testAssert(void)
{
    int is_one = 1;
    int should_be_one = 0;

    // This will have no effect
    assert(is_one);

    // This will halt kernel execution
    assert(should_be_one);
}

int main(int argc, char* argv[])
{
    testAssert<<<1,1>>>();
    cudaDeviceSynchronize();

    return 0;
}

There is a dedicated CUDA runtime error cudaErrorAssert which will be reported by any kernel which fires an assertion call during execution. As per all other device side runtime errors, the context will be destroyed on the error and a new context will need to be created (by calling cudaDeviceReset()).

Note that is (unfortunately) not supported on MacOS because of driver limitations.

Option 2 Illegal Instruction

You can use inline ptx and asm("trap;") to trigger an illegal instruction.

Here is some code demonstrating that:

#include <cuda_runtime.h>
#include <device_launch_parameters.h>
#include <cstdio>
#include <cstdlib>

__global__ void kernel(int i) {
    if(i > 0) {
        asm("trap;");
    }

    ::printf("%d\n", i);
}

inline void error_check(cudaError_t err, const char* file, int line) {
    if(err != cudaSuccess) {
        ::fprintf(stderr, "CUDA ERROR at %s[%d] : %s\n", file, line, cudaGetErrorString(err));
        abort();
    }
}
#define CUDA_CHECK(err) do { error_check(err, __FILE__, __LINE__); } while(0)


int main() {
    kernel<<<1, 1>>>(0);
    CUDA_CHECK(cudaGetLastError());
    CUDA_CHECK(cudaDeviceSynchronize());


    kernel<<<1, 1>>>(1);
    CUDA_CHECK(cudaGetLastError());
    CUDA_CHECK(cudaDeviceSynchronize());

}

which outputs:

0

CUDA ERROR at ...kernel.cu[31] : an illegal instruction was encountered

edited Jun 10, 2018 at 13:55

community wiki

2 revs, 2 users 55%
talonmies

Sign up to request clarification or add additional context in comments.

3 Comments

einpoklum Over a year ago

I find it confusing that triggering a trap tells you "illegal instruction". I should probably file a bug report asking for a separate cudaError_t value for that...

morteza Over a year ago

Using abort() inside a global function raises this error: '''Error calling a host function("abort") from a global function is not allowed'''

tommsch Over a year ago

There is also an inline function now available __trap which seems to do the same as asm("trap;").

Collectives™ on Stack Overflow

Triggering a runtime error within a CUDA kernel

1 Answer 1

Option 1 Assertions:

Option 2 Illegal Instruction

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Option 1 Assertions:

Option 2 Illegal Instruction

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related