How do you create a 2d array in cuda

Question

For practice, I'm working on making a simple matrix initialization program in cuda. I made a little sequential version for reference as a starting point. It just creates an n by m array and fills it with doubles. I've been reading through other posts and documentation, but I'm pretty confused and I was hoping someone could explain to me how to initialize a 2d array in cuda in a similar manner of n by m size as I did below. I would also appreciate insight on how to fill that cuda matrix if anyone would be willing to explain.

Hi again, in regards to it being a possible duplicate, I should elaborate. The linked post doesn't really explain anything, it's just sample code and it's one of the posts that I previously viewed but don't understand because it isn't explained. Thank you.

Sequential version:

#include <stdlib.h>
#include <stdio.h>
#include <math.h>
#include <assert.h>

int n,m, i, j;
double count;

void update(int n, int m, double arr[][m]){
  for(i=0; i<n; i++){
    for(j=0; j<m; j++){
      count++;
      arr[i][j] = count;
    }
  }
}


int main(int argc, char * argv[]) {
  assert(argc==3);
  n = atoi(argv[2]);
  m = atoi(argv[1]);

  double (*arr)[n][m] = malloc(sizeof *arr);
  update(n,m,arr);
  return 0;
}

double (*arr)[m] = malloc(n * sizeof *arr); – I don't think that does what you think it does. And why all variables global?? — Swordfish
– Swordfish, Commented Nov 28, 2018 at 6:27
Detail; In C, arr in double (*arr)[m] is not a 2D array, but a pointer to an array m of doubles. double arr[n][m] is a 2D array. — chux
– chux, Commented Nov 28, 2018 at 6:27
To create a pointer to a 2D array, C code could use double (*arr)[n][m] = malloc(sizeof *arr); — chux
– chux, Commented Nov 28, 2018 at 6:31
@Swordfish On 2nd thought, perhaps OP does want double (*arr)[m] = malloc(n * sizeof *arr); even if it is not a proper 2D array. Hmmm its late. — chux
– chux, Commented Nov 28, 2018 at 6:38

michelson · Accepted Answer · 2018-11-28 09:47:55Z

You can simulate 2D array in 1D, keeping data row by row. So that 2D array: [a,b][c,d] becomes [a,b,c,d]. To make things simple you can write a wrapper class providing such functionality.

Here is the demo (not 100% disasterproof, but working) of this idea

#pragma once
#include <iostream>
#include "cuda_runtime.h"
#include "device_launch_parameters.h"

typedef int TYPE;

// NOTE: copy consturctor and = operator need to be overloaded as well
template<class T>
struct Matrix
{
    Matrix(int r, int c) : rows(r), cols(c) {
        data = new T[r*c];
    }
    ~Matrix() {
        // As we allocated memory it needs to be freed upon destruction
        delete[] data;
        data = nullptr;
    }
    int rows, cols;
    T* data;
    T* operator[](int row) {
        // Returns pointer to "ROW", further call to [] on result will retrieve item at column in this row
        return data + (row*cols);
    }
};

// Simple cuda kernel 
__global__ void add(TYPE *a, TYPE *b, TYPE *c, int rows, int cols) {
    // Get element row and col
    int row = blockIdx.y * blockDim.y + threadIdx.y;
    int col = blockIdx.x * blockDim.x + threadIdx.x;
    // If kernel block/grid is not sized perfectly make sure not to step outside data bounds
    if(row < rows && col < cols)
    {
        int idx = row*cols + col;
        c[idx] = a[idx] + b[idx];
    }
}

int main() {
    // m3 = m1 + m2 using cuda
    int rows = 5, cols = 5, total = rows * cols;
    Matrix<TYPE> m1{ rows,cols }, m2{ rows,cols }, m3{ rows,cols };

    // Initialization as 1D array
    for(int i = 0; i < total; i++)  {
        m1.data[i] = i;
    }

    // Or initialization as 2D array
    for(int r = 0; r < rows; r++)
        for(int c = 0; c < cols; c++)
            m2[r][c] = r*cols + c + 100;

    for(int i = 0; i < total; i++)  std::cout << m1.data[i] << ", ";
    std::cout << "\n";

    for(int r = 0; r < rows; r++) {
        for(int c = 0; c < cols; c++) 
            std::cout << m2[r][c] << ", ";
        std::cout << "\n";
    }

    // CUDA part
    TYPE *d_m1, *d_m2, *d_m3;

    // Allocation
    cudaMalloc((void **) &d_m1, total * sizeof(TYPE));
    cudaMalloc((void **) &d_m2, total * sizeof(TYPE));
    cudaMalloc((void **) &d_m3, total * sizeof(TYPE));

    // Copy m1 and m2 to GPU
    cudaMemcpy(d_m1, m1.data, total * sizeof(TYPE), cudaMemcpyHostToDevice);
    cudaMemcpy(d_m2, m2.data, total * sizeof(TYPE), cudaMemcpyHostToDevice);

    // Oversized on purpose to show row/col guard on add kernel
    dim3 grid(5, 5);
    dim3 block(5, 5);
    add <<< grid, block >>> (d_m1, d_m2, d_m3, rows, cols);

    // Copy result to m3
    cudaMemcpy(m3.data, d_m3, total * sizeof(TYPE), cudaMemcpyDeviceToHost);

    cudaFree(d_m1);
    cudaFree(d_m2);
    cudaFree(d_m3);

    for(int r = 0; r < rows; r++) {
        for(int c = 0; c < cols; c++)
            std::cout << m3[r][c] << ", ";
        std::cout << "\n";
    }

    system("pause");
    return 0;
}

I don't understand the logic of going to the trouble of writing a wrapper class and not using it on both the host and device. Further the OP is complaining that the many other answers to this question already on Stack Overflow are apparently too hard to understand because of limited explanation. Your answer suffers from the same problem

Collectives™ on Stack Overflow

How do you create a 2d array in cuda

1 Answer 1

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related