0

So I'm trying to copy a jagged array from host to device. First of all here is my current understanding of cudaMalloc and cudaMemcpy:

cudaMalloc takes a pointer to the pointer to the memory block.

cudaMemcpy takes a pointer to the memory block to copy to or from.

Correct me if I'm wrong please.

Now this is my code that doesn't work (compiles fine but no output):

__global__ void kernel(int** arr)
{
    for (int i=0; i<3; i++)
    printf("%d\n", arr[i][0]);
}

int main()
{
    int arr[][3] = {{1},{2},{3}}; // 3 arrays, 1 element each

    int **d_arr;

    cudaMalloc((void**)(&d_arr), sizeof(int*)*3); // allocate for 3 int pointers

    for (int i=0; i<3; i++)
    {
    cudaMalloc( (void**)  &(d_arr[i]), sizeof(int) * 1 ); // allocate for 1 int in each int pointer

    cudaMemcpy(d_arr[i], arr[i], sizeof(int) * 1, cudaMemcpyHostToDevice); // copy data
    }

    kernel<<<1,1>>>(d_arr);

    cudaDeviceSynchronize();
    cudaDeviceReset();
}

So what am I doing wrong here? Cheers

2 Answers 2

2

I found out why, it's because cudaMalloc and cudaMemcpy expect pointers that exist on the host and not on the device.

In my for-loop I was trying to fill pointers that exist on the device, in code that runs on host !

The right way is to make an intermediate variable, a pointer on host that points to memory on the device, fill it with integers, then copy that pointer into the jagged array (the pointer on pointers) !

This is the correct version:

__global__ void kernel(int** arr)
{
    for (int i=0; i<3; i++)
        printf("%d\n", arr[i][0]);
}

int main()
{
    int arr[][3] = {{1},{2},{3}}; // 3 arrays, 1 element each

    int **d_arr;

    cudaMalloc((void***)(&d_arr), sizeof(int*)*3); // allocate for 3 int pointers

    for (int i=0; i<3; i++)
    {

        int* temp;

        cudaMalloc( (void**)  &(temp), sizeof(int) * 1 ); // allocate for 1 int in each int pointer

        cudaMemcpy(temp, arr[i], sizeof(int) * 1, cudaMemcpyHostToDevice); // copy data

        cudaMemcpy(d_arr+i, &temp, sizeof(int*), cudaMemcpyHostToDevice);
    }

    kernel<<<1,1>>>(d_arr);

    cudaDeviceSynchronize();
    cudaDeviceReset();
}
Sign up to request clarification or add additional context in comments.

Comments

0
  1. Your kernel calls printf(), which is used to be (until CC2.0) a host function. Everything ok here. ;)

  2. cudaMemcpy((void*)d_arr, (void*)arr, sizeof(int*)*3, cudaMemcpyHostToDevice); copies the Memory adresses of your Arrays on the host to the device. That makes no sense. Since you now have pointers to host memory on the device.

  3. You can not allocate 2d Arrays that particular way in CUDA. See http://www.stevenmarkford.com/allocating-2d-arrays-in-cuda/.

2 Comments

1- No, you can actually use printf() on device. 2- You are right. 3- Check my answer. 4- Thanks, I will have a read.
1. You are right, it is actually possible since compute capability 2 - . 2. - 3. "this way" does not refer to your answer but your question. ;) 4. You're welcome.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.