Hot Linked Questions

0 votes

1 answer

2k views

CUDA cudaDeviceSynchronize complete Error codes reference? [duplicate]

I'm getting "CudaDeviceSynchronize returned error code 75 after running some kernel" (sounds like something simple missing in my code) BUT I failed to find any reference/Q&A/Topic about that ...

Ardeshir Izadi

1,115

asked Jan 7, 2017 at 3:59

-2 votes

1 answer

368 views

CUDA program returning incorrect results [duplicate]

I have copied a vector addition example from the book "CUDA By Example" and I am getting unexpected incorrect results. Here is my code #define N (33*1024) __global__ void add(int *a, int *b,int *c){...

user1893354

5,948

asked Feb 7, 2019 at 3:00

52 votes

1 answer

68k views

What does #pragma unroll do exactly? Does it affect the number of threads?

I'm new to CUDA, and I can't understand loop unrolling. I've written a piece of code to understand the technique __global__ void kernel(float *b, int size) { int tid = blockDim.x * blockIdx.x + ...

Magzhan Abdibayev

707

asked Mar 9, 2014 at 5:05

18 votes

5 answers

41k views

Timing CUDA operations

I need to time a CUDA kernel execution. The Best Practices Guide says that we can use either events or standard timing functions like clock() in Windows. My problem is that using these two functions ...

Tudor

62.6k

asked Oct 24, 2011 at 13:48

23 votes

7 answers

19k views

Computing the null space of a matrix as fast as possible

I need to compute the nullspace of several thousand small matrices (8x9, not 4x3 as I wrote previously) in parallel (CUDA). All references point to SVD but the algorithm in numerical recipes seems ...

zenna

9,216

asked Feb 2, 2010 at 1:49

25 votes

2 answers

10k views

Way to verify kernel was executed in CUDA

When I call a kernel with ill-set parameters (e.g. more than 512 threads per block) or when the operations inside it require more than what my device has to offer (e.g. too many registers) the kernel ...

Renan

1,990

asked Jun 21, 2011 at 2:23

16 votes

6 answers

68k views

Fatal error: cuda.h: No such file or directory

I successfully installed CUDA 8.0 in my PC and I can see its files by running the following commands in my Ubuntu 16.10: $ sudo find / -name nvcc /usr/local/cuda-8.0/bin/nvcc $ sudo find / -name ...

mad

2,799

asked Dec 23, 2016 at 20:56

19 votes

3 answers

50k views

What does "RuntimeError: CUDA error: device-side assert triggered" in PyTorch mean?

I have seen a lot of specific posts to particular case-specific problems, but no fundamental motivating explanation. What does this error: RuntimeError: CUDA error: device-side assert triggered mean? ...

Joseph Konan

716

asked Apr 21, 2019 at 7:48

11 votes

6 answers

17k views

How to compile PTX code

I need to modify the PTX code and compile it directly. The reason is that I want to have some specific instructions right after each other and it is difficult to write a cuda code that results my ...

user2998135

121

asked Nov 15, 2013 at 23:17

13 votes

5 answers

10k views

Equivalent of cudaGetErrorString for cuBLAS?

CUDA runtime has a convenience function cudaGetErrorString(cudaError_t error) that translates an error enum into a readable string. cudaGetErrorString is used in the CUDA_SAFE_CALL(someCudaFunction()) ...

solvingPuzzles

8,929

asked Oct 24, 2012 at 0:38

7 votes

2 answers

17k views

cudaMallocPitch and cudaMemcpy2D

I have an error when transfering C++ 2D array into CUDA 1D array. Let me show my source code. int main(void) { float h_arr[1024][256]; float *d_arr; // --- Some codes to populate ...

신우석

155

asked Mar 3, 2016 at 11:50

7 votes

2 answers

6k views

Custom Kernel GpuMat with float

I'm trying to write a custom kernel using GpuMat data to find the arc cosine of an image's pixels. I can upload, download, and change values when I upload data when the GPU has CV_8UC1 data but chars ...

jon

110

asked Jul 7, 2014 at 14:57

9 votes

4 answers

8k views

Does CUDA really not have a calloc()-like API call?

From looking at the CUDA 5.5 API Reference and the CUDA C Programming Guide it seems that there is no cudaCalloc(), an on-GPU equivalent of the standard C library's calloc(). Is there really no API ...

einpoklum

137k

asked Jan 20, 2014 at 12:10

7 votes

3 answers

8k views

Parallel implementation for multiple SVDs using CUDA

I'm new to parallel programming using GPU so I apologize if the question is broad or vague. I'm aware there is some parallel SVD function in the CULA library, but what should be the strategy if I have ...

Logan Yang

2,604

asked Jul 1, 2013 at 10:01

11 votes

1 answer

14k views

CUDA - Parallel Reduction Sum

I am trying to implement a parallel reduction sum in CUDA 7.5. I have been trying to follow the NVIDIA PDF that walks you through the initial algorithm and then steadily more optimised versions. I am ...

Terramet

155

asked Nov 2, 2017 at 19:28

Collectives™ on Stack Overflow

Linked Questions

CUDA cudaDeviceSynchronize complete Error codes reference? [duplicate]

CUDA program returning incorrect results [duplicate]

What does #pragma unroll do exactly? Does it affect the number of threads?

Timing CUDA operations

Computing the null space of a matrix as fast as possible

Way to verify kernel was executed in CUDA

Fatal error: cuda.h: No such file or directory

What does "RuntimeError: CUDA error: device-side assert triggered" in PyTorch mean?

How to compile PTX code

Equivalent of cudaGetErrorString for cuBLAS?

cudaMallocPitch and cudaMemcpy2D

Custom Kernel GpuMat with float

Does CUDA really not have a calloc()-like API call?

Parallel implementation for multiple SVDs using CUDA

CUDA - Parallel Reduction Sum

Hot Network Questions