I'm running on Arch Linux:
I have read in multiple places that kernel invocation is asynchronous with respect to the CPU (will return immediately and allow CPU to continue). However, I'm not getting that behavior.
e.g.
kernel<<<blocks,threads>>>();
printf("print immediately\n");
check_cuda_error();
CPU seems to lock up and nothing is printed (likewise nothing else is executed) to the console until kernel is completed. Tested with kernels of all sorts of different execution times (1s, 2s, 3s, etc.) and calculations to make sure it wasn't my kernel.
Is this a driver issue? Or am I misinterpreting something