CUDA copy jagged array from host to device

Question

So I'm trying to copy a jagged array from host to device. First of all here is my current understanding of cudaMalloc and cudaMemcpy:

cudaMalloc takes a pointer to the pointer to the memory block.

cudaMemcpy takes a pointer to the memory block to copy to or from.

Correct me if I'm wrong please.

Now this is my code that doesn't work (compiles fine but no output):

__global__ void kernel(int** arr)
{
    for (int i=0; i<3; i++)
    printf("%d\n", arr[i][0]);
}

int main()
{
    int arr[][3] = {{1},{2},{3}}; // 3 arrays, 1 element each

    int **d_arr;

    cudaMalloc((void**)(&d_arr), sizeof(int*)*3); // allocate for 3 int pointers

    for (int i=0; i<3; i++)
    {
    cudaMalloc( (void**)  &(d_arr[i]), sizeof(int) * 1 ); // allocate for 1 int in each int pointer

    cudaMemcpy(d_arr[i], arr[i], sizeof(int) * 1, cudaMemcpyHostToDevice); // copy data
    }

    kernel<<<1,1>>>(d_arr);

    cudaDeviceSynchronize();
    cudaDeviceReset();
}

So what am I doing wrong here? Cheers

SpaceMonkey · Accepted Answer · 2013-04-17 13:01:23Z

I found out why, it's because cudaMalloc and cudaMemcpy expect pointers that exist on the host and not on the device.

In my for-loop I was trying to fill pointers that exist on the device, in code that runs on host !

The right way is to make an intermediate variable, a pointer on host that points to memory on the device, fill it with integers, then copy that pointer into the jagged array (the pointer on pointers) !

This is the correct version:

__global__ void kernel(int** arr)
{
    for (int i=0; i<3; i++)
        printf("%d\n", arr[i][0]);
}

int main()
{
    int arr[][3] = {{1},{2},{3}}; // 3 arrays, 1 element each

    int **d_arr;

    cudaMalloc((void***)(&d_arr), sizeof(int*)*3); // allocate for 3 int pointers

    for (int i=0; i<3; i++)
    {

        int* temp;

        cudaMalloc( (void**)  &(temp), sizeof(int) * 1 ); // allocate for 1 int in each int pointer

        cudaMemcpy(temp, arr[i], sizeof(int) * 1, cudaMemcpyHostToDevice); // copy data

        cudaMemcpy(d_arr+i, &temp, sizeof(int*), cudaMemcpyHostToDevice);
    }

    kernel<<<1,1>>>(d_arr);

    cudaDeviceSynchronize();
    cudaDeviceReset();
}

Pixelchemist · Accepted Answer · 2013-07-17 17:25:31Z

0

Your kernel calls printf(), which is used to be (until CC2.0) a host function. Everything ok here. ;)
cudaMemcpy((void*)d_arr, (void*)arr, sizeof(int*)*3, cudaMemcpyHostToDevice); copies the Memory adresses of your Arrays on the host to the device. That makes no sense. Since you now have pointers to host memory on the device.
You can not allocate 2d Arrays that particular way in CUDA. See http://www.stevenmarkford.com/allocating-2d-arrays-in-cuda/.

edited Jul 17, 2013 at 17:25

answered Apr 17, 2013 at 13:11

Pixelchemist

25.1k8 gold badges51 silver badges72 bronze badges

2 Comments

SpaceMonkey Over a year ago

1- No, you can actually use printf() on device. 2- You are right. 3- Check my answer. 4- Thanks, I will have a read.

Pixelchemist Over a year ago

1. You are right, it is actually possible since compute capability 2 - . 2. - 3. "this way" does not refer to your answer but your question. ;) 4. You're welcome.

Collectives™ on Stack Overflow

CUDA copy jagged array from host to device

2 Answers 2

Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related