I have a 3D matrix where the x-y plane(s) represent an image and the z-plane represents image layers.
The issue is when I try to extract the first (or other layers) using idz, I do not get the expected results. It looks like the array, once in CUDA, has different indexes for x, y or z than what I expect (as in pycuda). I see this by the result array below.
The following is a step by step process for this mini example (I used generic int numbers to represent my images to save uploading images and the entire code)!
Here I import libraries and define image size and layers...
import pycuda.driver as cuda
import pycuda.autoinit
from pycuda.compiler import SourceModule
import numpy
from pycuda.gpuarray import to_gpu
row = 10
column = 10
depth = 5
Then I define my input 3D array and my output 2D array...
#--==== Input 3D Array ====---
arrayA = numpy.full((row, column, depth), 0)
#populate each layer with fixed values
for i in range(depth):
arrayA[:,:,i] = i + 1
arrayA = arrayA.astype(numpy.uint16)
arrayA_gpu = cuda.mem_alloc(arrayA.nbytes)
cuda.memcpy_htod(arrayA_gpu, arrayA)
arrayA_Answer = numpy.empty_like(arrayA)
#--==== Output 2D array container ====---
arrayB = numpy.zeros([row, column], dtype = numpy.uint16)
arrayB_gpu = cuda.mem_alloc(arrayB.nbytes)
cuda.memcpy_htod(arrayB_gpu, arrayB)
arrayB_Answer = numpy.empty_like(arrayB)
Next I define the CUDA kernal and function in pycuda
mod = SourceModule("""
__global__ void getLayer(int *arrayA, int *arrayB)
{
int idx = threadIdx.x + (blockIdx.x * blockDim.x); // x coordinate (numpy axis 2)
int idy = threadIdx.y + (blockIdx.y * blockDim.y); // y coordinate (numpy axis 1)
int idz = 0; //The first layer, this can set in range from 0-4
int x_width = (blockDim.x * gridDim.x);
int y_width = (blockDim.y * gridDim.y);
arrayB[idx + (x_width * idy)] = arrayA[idx + (x_width * idy) + (x_width * y_width) * idz];
}
""")
func = mod.get_function("getLayer")
func(arrayA_gpu, arrayB_gpu, block=(row, column, 1), grid=(1,1))
Using standard pycuda commands, I extract the results (not what I expected)
arrayA[:,:,0] = 10x10 matrix populated with 1's (good)
print(arrayA_Answer[:,:,0])
[[1 1 1 1 1 1 1 1 1 1]
[1 1 1 1 1 1 1 1 1 1]
[1 1 1 1 1 1 1 1 1 1]
[1 1 1 1 1 1 1 1 1 1]
[1 1 1 1 1 1 1 1 1 1]
[1 1 1 1 1 1 1 1 1 1]
[1 1 1 1 1 1 1 1 1 1]
[1 1 1 1 1 1 1 1 1 1]
[1 1 1 1 1 1 1 1 1 1]
[1 1 1 1 1 1 1 1 1 1]]
arrayB[:,:] = 10x10 matrix populated with the following (bad), expected to be equal to arrayA[:,:,0]...
print(arrayB_Answer)
[[1 2 3 4 5 1 2 3 4 5]
[1 2 3 4 5 1 2 3 4 5]
[1 2 3 4 5 1 2 3 4 5]
[1 2 3 4 5 1 2 3 4 5]
[1 2 3 4 5 1 2 3 4 5]
[1 2 3 4 5 1 2 3 4 5]
[1 2 3 4 5 1 2 3 4 5]
[1 2 3 4 5 1 2 3 4 5]
[1 2 3 4 5 1 2 3 4 5]
[1 2 3 4 5 1 2 3 4 5]]