How to relate kernel input data structure in CUDA kernel function with parameter input in pycuda

Question

I am writing a cuda kernel to convert rgba image to gray scale image in pycuda, here is the PyCUDA code:

import numpy as np
import matplotlib.pyplot as plt
import pycuda.autoinit
import pycuda.driver as cuda
from pycuda.compiler import SourceModule
kernel = SourceModule("""
#include <stdio.h>
__global__ void rgba_to_greyscale(const uchar4* const rgbaImage,
                   unsigned char* const greyImage,
                   int numRows, int numCols)
{
  int y = threadIdx.y+ blockIdx.y* blockDim.y;
  int x = threadIdx.x+ blockIdx.x* blockDim.x;
  if (y < numCols && x < numRows) {
    int index = numRows*y +x;
    uchar4 color = rgbaImage[index];
    unsigned char grey = (unsigned char)(0.299f*color.x+ 0.587f*color.y + 
    0.114f*color.z);
    greyImage[index] = grey;
 }
}
""")

However, the problem is how to relate uchar4* to numpy array. I know can modify my kernel function to accept int* or float*, and make it work. But I just wonder how to make the above kernel function to work in pycuda.

Below is host code.

def gpu_rgb2gray(image):
    shape = image.shape
    n_rows, n_cols, _ = np.array(shape, dtype=np.int)
    image_gray = np.empty((n_rows, n_cols), dtype= np.int)
    ## HERE is confusing part, how to rearrange image to match unchar4* ??
    image = image.reshape(1, -1, 4)
    # Get kernel function
    rgba2gray = kernel.get_function("rgba_to_greyscale")
    # Define block, grid and compute
    blockDim = (32, 32, 1) # 1024 threads in total
    dx, mx = divmod(shape[1], blockDim[0])
    dy, my = divmod(shape[0], blockDim[1])
    gridDim = ((dx + (mx>0)), (dy + (my>0)), 1)
    # Kernel function
    # HERE doesn't work because of mismatch
    rgba2gray (
        cuda.In(image), cuda.Out(image_gray), n_rows, n_cols,
        block=blockDim, grid=gridDim)
    return image_gray

Anyone have any ideas? Thanks!

3 revs · Accepted Answer · 2018-01-08 12:36:15Z

1

The gpuarray class has native support for CUDA's built in vector types (including uchar4).

So you can create as gpuarray instance with the correct dtype for the kernel, and copy the host image to that gpuarray using buffers, then use the gpuarray as the kernel input argument. As an example (and if I understood your code correctly), something like this should probably work:

import pycuda.gpuarray as gpuarray

....

def gpu_rgb2gray(image):
    shape = image.shape
    image_rgb = gpuarray.empty(shape, dtype=gpuarray.vec.uchar4)
    cuda.memcpy_htod(image_rgb.gpudata, image.data)
    image_gray = gpuarray.empty(shape, dtype=np.uint8)

    # Get kernel function
    rgba2gray = kernel.get_function("rgba_to_greyscale")
    # Define block, grid and compute
    blockDim = (32, 32, 1) # 1024 threads in total
    dx, mx = divmod(shape[1], blockDim[0])
    dy, my = divmod(shape[0], blockDim[1])
    gridDim = ((dx + (mx>0)), (dy + (my>0)), 1)
    rgba2gray ( image_rgb, image_gray, np.int32(shape[0]), np.int32(shape[1]), block=blockDim, grid=gridDim)

    img_gray = np.array(image_gray.get(), dtype=np.int)

    return img_gray

this would take an image of 32 bit unsigned integers and copy them to an array of uchar4 on the GPU and then upcast the resulting array of uchar back to integers on the device.

edited Jan 8, 2018 at 12:36

community wiki

3 revs
talonmies

Sign up to request clarification or add additional context in comments.

1 Comment

Jiadong Over a year ago

Hi, sorry for late reply. I check everything in gpuarray, it worked perfectly well, thank you!

Collectives™ on Stack Overflow

How to relate kernel input data structure in CUDA kernel function with parameter input in pycuda

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related