Linked Questions

0 votes
1 answer
2k views

I'm getting "CudaDeviceSynchronize returned error code 75 after running some kernel" (sounds like something simple missing in my code) BUT I failed to find any reference/Q&A/Topic about that ...
Ardeshir Izadi's user avatar
-2 votes
1 answer
368 views

I have copied a vector addition example from the book "CUDA By Example" and I am getting unexpected incorrect results. Here is my code #define N (33*1024) __global__ void add(int *a, int *b,int *c){...
user1893354's user avatar
  • 5,948
52 votes
1 answer
68k views

I'm new to CUDA, and I can't understand loop unrolling. I've written a piece of code to understand the technique __global__ void kernel(float *b, int size) { int tid = blockDim.x * blockIdx.x + ...
Magzhan  Abdibayev's user avatar
18 votes
5 answers
41k views

I need to time a CUDA kernel execution. The Best Practices Guide says that we can use either events or standard timing functions like clock() in Windows. My problem is that using these two functions ...
Tudor's user avatar
  • 62.6k
23 votes
7 answers
19k views

I need to compute the nullspace of several thousand small matrices (8x9, not 4x3 as I wrote previously) in parallel (CUDA). All references point to SVD but the algorithm in numerical recipes seems ...
zenna's user avatar
  • 9,216
25 votes
2 answers
10k views

When I call a kernel with ill-set parameters (e.g. more than 512 threads per block) or when the operations inside it require more than what my device has to offer (e.g. too many registers) the kernel ...
Renan's user avatar
  • 1,990
16 votes
6 answers
68k views

I successfully installed CUDA 8.0 in my PC and I can see its files by running the following commands in my Ubuntu 16.10: $ sudo find / -name nvcc /usr/local/cuda-8.0/bin/nvcc $ sudo find / -name ...
mad's user avatar
  • 2,799
19 votes
3 answers
50k views

I have seen a lot of specific posts to particular case-specific problems, but no fundamental motivating explanation. What does this error: RuntimeError: CUDA error: device-side assert triggered mean? ...
Joseph Konan's user avatar
11 votes
6 answers
17k views

I need to modify the PTX code and compile it directly. The reason is that I want to have some specific instructions right after each other and it is difficult to write a cuda code that results my ...
user2998135's user avatar
13 votes
5 answers
10k views

CUDA runtime has a convenience function cudaGetErrorString(cudaError_t error) that translates an error enum into a readable string. cudaGetErrorString is used in the CUDA_SAFE_CALL(someCudaFunction()) ...
solvingPuzzles's user avatar
7 votes
2 answers
17k views

I have an error when transfering C++ 2D array into CUDA 1D array. Let me show my source code. int main(void) { float h_arr[1024][256]; float *d_arr; // --- Some codes to populate ...
신우석's user avatar
  • 155
7 votes
2 answers
6k views

I'm trying to write a custom kernel using GpuMat data to find the arc cosine of an image's pixels. I can upload, download, and change values when I upload data when the GPU has CV_8UC1 data but chars ...
jon's user avatar
  • 110
9 votes
4 answers
8k views

From looking at the CUDA 5.5 API Reference and the CUDA C Programming Guide it seems that there is no cudaCalloc(), an on-GPU equivalent of the standard C library's calloc(). Is there really no API ...
einpoklum's user avatar
  • 137k
7 votes
3 answers
8k views

I'm new to parallel programming using GPU so I apologize if the question is broad or vague. I'm aware there is some parallel SVD function in the CULA library, but what should be the strategy if I have ...
Logan Yang's user avatar
  • 2,604
11 votes
1 answer
14k views

I am trying to implement a parallel reduction sum in CUDA 7.5. I have been trying to follow the NVIDIA PDF that walks you through the initial algorithm and then steadily more optimised versions. I am ...
Terramet's user avatar
  • 155

15 30 50 per page
1
2 3 4 5
34