How to create globally-accessible variable on cuda?

Question

This is a pretty complicated question, and I'm not a native English speaker, so I'll thanks if you are patient enough to read my question.

As Cuda is actually operating on two computers, it is invalid to point to a host's memory when you are on device, which means you cannot copy structs (or objects) to device if they have pointer members.

I tried to make the following system to solve this issue:

use integers instead of pointers. The integer is an offset inside a memory pool. The integer is wrapped in a class (overloads "->" and "*") to make it looks like a pointer.
the memory pool object manages a continuous array of objects, which can be easily transferred to Cuda device. The pool's content synchronizes between host and device, so an integer offset would have same meaning on both two sides.

To conveniently use the offset, it should be wrapped. In host side, the wrapper looks like this:

template<typename T>
class MemPoolPointer {
public:
    inline T* operator -> () const
    {
        return &( MemPool<T>::get_instance.get_object(_p) );
    }
    uint64_t _p;
}

We can see, the pointer class requires globally access of the memory pool. This is usually implemented by make the memory pool to be singleton. However, Cuda do not allow static members, and it limits __device__ variables to be file scope. How can I workaround these limitations? Or I should try OpenCL?

Your class (singleton or not), can reside on the CPU, and own pointers to host and device memory allocated using normal cudaMalloc, etc. You can then retrieve device pointers (offset as needed) from the class at kernel invocation time and pass them to the kernel. I see no problem here... — harrism
– harrism, Commented Oct 9, 2012 at 1:22
@jiandingzhe: Could you please add your solution as an answer to this question? You will later be able to accept your own answer (this is allowed), and the question will be marked as solved, making it easier for other people to find by search. Thank you. — talonmies
– talonmies, Commented Aug 18, 2013 at 7:07
@jiandingzhe: it would be helpful if you accept the community wiki answer I have added to this question so that drops off the CUDA unanswered question list — talonmies
– talonmies, Commented Dec 26, 2016 at 19:35

talonmies · Accepted Answer · 2016-11-14 10:32:40Z

1

The OP was able to solve this by wrapping a global scope __device__ variable using a static class method like this:

class FooBar;
__device__ FooBar* FOOBAR_DEVICE_POOL;
class FooBar
{
    __device__ static FooBar& DEVICE_GET(uint64_t p);
}

template<typename T>
class MemPoolPointer {
public:
    inline T* operator -> () const
    {
#ifdef __CUDA_ARCH__
        return &( T::DEVICE_GET(_p) );
#else
        return &( MemPool<T>::get_instance.get_object(_p) );
#endif
    }
    uint64_t _p;
}

[this answer added as a community wiki entry to get the question off the unanswered queue for the CUDA tag]

Collectives™ on Stack Overflow

How to create globally-accessible variable on cuda?

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related