0

I'm new to CUDA and C and I could use some help with the following: I want to pass a C array of GpuMats to a CUDA kernel:

Here is the code of my Kernel:

__global__
    void disparityFromDiffMapsKernel(cuda::PtrStepSzi* differenceMapsArray,
                                 int arraySize,
                                 cuda::PtrStepSzi disparityMap){
    int x = blockIdx.x * blockDim.x + threadIdx.x;
    int y = blockIdx.y * blockDim.y + threadIdx.y;



    //check if thread is inside the image
    if(x > differenceMapsArray[0].cols || y > differenceMapsArray[0].rows){
        return;
    }

    //do stuff

}

And here is the code where I initialize the array and call the kernel:

cuda::PtrStepSzi diffMaps[diffMapsSize];
for(int i = 0; i <= offset; i++){
    cuda::GpuMat diffMap(leftImageGPU.size(),CV_32SC1);
    cuda::PtrStepSzi diffMapPtr = diffMap;
    diffMaps[i] = diffMapPtr;
}

disparityFromDiffMapsKernel<<<numBlocks,threadsPerBlock>>>(diffMaps,diffMapsSize,disparityImageGPU); //gpu mat is initialized before

When I run this code I get the following opencv error:

OpenCV(3.4.1) Error: Gpu API call (an illegal memory access was encountered)

I would really appreciate any help!

2
  • 1
    diffMaps resides on the host, while you are passing it as a kernel argument and trying to access it on the device. Commented Dec 18, 2018 at 5:54
  • That was the problem! Thx @sgarizvi Commented Dec 18, 2018 at 9:55

1 Answer 1

3

I found a solution to my problem by moving the array to the gpu memory via cudaMalloc and cudaMemcpy (thanks to @sgarizvi for the hint)

Here is the final code, in case someone has a similar problem:

// reserve memory for the diffmap ptrs arrays
cuda::PtrStepSzi* cpuDiffMapPtrs;
cpuDiffMapPtrs = (cuda::PtrStepSzi*) malloc(diffMapsSize * sizeof(cuda::PtrStepSzi));

cuda::PtrStepSzi* gpuDiffMapPtrs;
cudaMalloc(&gpuDiffMapPtrs, diffMapsSize * sizeof(cuda::PtrStepSzi));

//fill cpu array with ptrs to gpu mats
for(int i = 0; i< diffMapsSize; i++){
    cuda::GpuMat diffMap(leftImageGPU.size(),CV_32SC1);
    //do stuff with the gpu mats
    cpuDiffMapPtrs[i] = diffMap;
}

//copy cpu array to gpu
cudaMemcpy(gpuDiffMapPtrs,cpuDiffMapPtrs,diffMapsSize * sizeof(cuda::PtrStepSzi), cudaMemcpyHostToDevice);



disparityFromDiffMapsKernel<<<numBlocks,threadsPerBlock>>>(gpuDiffMapPtrs,diffMapsSize,halfKernelSize,disparityImageGPU);

// free the allocated memory
cudaFree(gpuDiffMapPtrs);
free(cpuDiffMapPtrs);
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.