Declaring device variables from main CUDA

Question

I am new to cuda, so I hope my question isn't totally off base. I want to create an array on the global device memory but I will only know how large it will be in the middle of my main function (but before I ever access the device).

Because I don't know the size I can't declare before my code: device myArr[]

So I thought of creating a pointer in main, d_myArr, and then using cudaMalloc(d_myArr, arrSize) to allocate memory on the device but then I never really declare a variable on my device.

I do not see a reason to send d_Arr to my kernel as it will then only exist in that kernel (I think?), I just want the variable to exist on my device as a global variable in the first place, and be accessable by different kernels.

can I declare a device variable inside main? As in:

int main(){
   .
   .
   __device__ myArr[size];
   .
   .
}

If so is it discoureged for some reason (because I can't find anyone doing so). If this isn't allowed what can I do instead? I saw people mentioning cudaMemcpyToSymbol but I couldn't figure out if it was relavent to what I wanted exactly, if it is I would be glad if someone could explain exactly how it can be used to achive what I need.

On a side question I also have a constant variable I want to exist on both my device and host. For now I just declaired it twice, once with device and once without, is there a better way of doing this?

Robert Crovella · Accepted Answer · 2017-02-08 15:43:28Z

3

This won't work.

__device__ variables must be declared at global scope.
The size of an allocation associated with a __device__ variable must be known at compile time.

Instead, just use cudaMalloc to allocate space for a variable, once you know the desired size of allocation. This method allows for dynamically allocated global variables. The __device__ method only allows for statically allocated global variables.

Something like this:

int main(){
   // ...
   int *d_data;
   cudaMalloc(&d_data, size*sizeof(int));
   // ... 
   kernel1<<<...>>>(d_data,...);
   // ... 
   kernel2<<<...>>>(d_data,...);
   // ...
}

It is perfectly legal to pass such a dynamically allocated global variable to more than one kernel as I have shown above, and the data or modifications placed there by kernel1 above (if any) will be visible to the code running in kernel2 above, for the example I have shown.

For the constant variable question, the approach you have mentioned is reasonable. If you have such constant data that is not known at compile time, that is a sensible approach (you may also wish to investigate using __constant__ instead of __device__). If on the other hand, you have constant data that is known at compile time, then either using

 #define MYCONSTANT 123

or

 constant int myconstant=123;

at global scope (i.e. outside of main) will allow such a definition to be used equally in host or device code, without having to declare or manage it twice. This last method will work well for POD data types (e.g. int, float, double, etc.) but will not work for complex types like struct.

edited Feb 8, 2017 at 15:43

answered Feb 8, 2017 at 15:37

Robert Crovella

154k12 gold badges255 silver badges300 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Oha Noch Over a year ago

Thanks for the response! Just so I can make sure I understand currectly, because I allocated memory in the device it wont get lost and even though I dont use cudaMemcpy back and forth of information the s_sata in my second kernel would be the same as was left off by the end of my first kernel, right? So as long as I resend the pointer to all my kernels they all have a way to reference that variable, otherwise if I dont resend that variable it still exists in the allocated memory but I just dont have a reference to access it by?

Robert Crovella Over a year ago

Yes, once you allocate device memory with cudaMalloc, it is persistent until you call a cudaFree operation on it (or until your application terminates). It behaves like any other memory. Once you write something to it, subsequent operations can see what was written, whether it is subsequent kernels or subsequent cudaMemcpy operations.

Collectives™ on Stack Overflow

Declaring device variables from main CUDA

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related