Linked Questions
507 questions linked to/from What is the canonical way to check for errors using the CUDA runtime API?
0
votes
1
answer
2k
views
CUDA cudaDeviceSynchronize complete Error codes reference? [duplicate]
I'm getting "CudaDeviceSynchronize returned error code 75 after running some kernel"
(sounds like something simple missing in my code)
BUT I failed to find any reference/Q&A/Topic about that ...
-2
votes
1
answer
368
views
CUDA program returning incorrect results [duplicate]
I have copied a vector addition example from the book "CUDA By Example" and I am getting unexpected incorrect results. Here is my code
#define N (33*1024)
__global__
void add(int *a, int *b,int *c){...
52
votes
1
answer
68k
views
What does #pragma unroll do exactly? Does it affect the number of threads?
I'm new to CUDA, and I can't understand loop unrolling. I've written a piece of code to understand the technique
__global__ void kernel(float *b, int size)
{
int tid = blockDim.x * blockIdx.x + ...
18
votes
5
answers
41k
views
Timing CUDA operations
I need to time a CUDA kernel execution. The Best Practices Guide says that we can use either events or standard timing functions like clock() in Windows. My problem is that using these two functions ...
23
votes
7
answers
19k
views
Computing the null space of a matrix as fast as possible
I need to compute the nullspace of several thousand small matrices (8x9, not 4x3 as I wrote previously) in parallel (CUDA). All references point to SVD but the algorithm in numerical recipes seems ...
25
votes
2
answers
10k
views
Way to verify kernel was executed in CUDA
When I call a kernel with ill-set parameters (e.g. more than 512 threads per block) or when the operations inside it require more than what my device has to offer (e.g. too many registers) the kernel ...
16
votes
6
answers
68k
views
Fatal error: cuda.h: No such file or directory
I successfully installed CUDA 8.0 in my PC and I can see its files by running the following commands in my Ubuntu 16.10:
$ sudo find / -name nvcc
/usr/local/cuda-8.0/bin/nvcc
$ sudo find / -name ...
19
votes
3
answers
50k
views
What does "RuntimeError: CUDA error: device-side assert triggered" in PyTorch mean?
I have seen a lot of specific posts to particular case-specific problems, but no fundamental motivating explanation. What does this error:
RuntimeError: CUDA error: device-side assert triggered
mean? ...
11
votes
6
answers
17k
views
How to compile PTX code
I need to modify the PTX code and compile it directly. The reason is that I want to have some specific instructions right after each other and it is difficult to write a cuda code that results my ...
13
votes
5
answers
10k
views
Equivalent of cudaGetErrorString for cuBLAS?
CUDA runtime has a convenience function cudaGetErrorString(cudaError_t error) that translates an error enum into a readable string. cudaGetErrorString is used in the CUDA_SAFE_CALL(someCudaFunction()) ...
7
votes
2
answers
17k
views
cudaMallocPitch and cudaMemcpy2D
I have an error when transfering C++ 2D array into CUDA 1D array.
Let me show my source code.
int main(void)
{
float h_arr[1024][256];
float *d_arr;
// --- Some codes to populate ...
7
votes
2
answers
6k
views
Custom Kernel GpuMat with float
I'm trying to write a custom kernel using GpuMat data to find the arc cosine of an image's pixels. I can upload, download, and change values when I upload data when the GPU has CV_8UC1 data but chars ...
9
votes
4
answers
8k
views
Does CUDA really not have a calloc()-like API call?
From looking at the CUDA 5.5 API Reference and the CUDA C Programming Guide it seems that there is no cudaCalloc(), an on-GPU equivalent of the standard C library's calloc().
Is there really no API ...
7
votes
3
answers
8k
views
Parallel implementation for multiple SVDs using CUDA
I'm new to parallel programming using GPU so I apologize if the question is broad or vague. I'm aware there is some parallel SVD function in the CULA library, but what should be the strategy if I have ...
11
votes
1
answer
14k
views
CUDA - Parallel Reduction Sum
I am trying to implement a parallel reduction sum in CUDA 7.5. I have been trying to follow the NVIDIA PDF that walks you through the initial algorithm and then steadily more optimised versions. I am ...