2

This is a pretty complicated question, and I'm not a native English speaker, so I'll thanks if you are patient enough to read my question.

As Cuda is actually operating on two computers, it is invalid to point to a host's memory when you are on device, which means you cannot copy structs (or objects) to device if they have pointer members.

I tried to make the following system to solve this issue:

  1. use integers instead of pointers. The integer is an offset inside a memory pool. The integer is wrapped in a class (overloads "->" and "*") to make it looks like a pointer.
  2. the memory pool object manages a continuous array of objects, which can be easily transferred to Cuda device. The pool's content synchronizes between host and device, so an integer offset would have same meaning on both two sides.

To conveniently use the offset, it should be wrapped. In host side, the wrapper looks like this:

template<typename T>
class MemPoolPointer {
public:
    inline T* operator -> () const
    {
        return &( MemPool<T>::get_instance.get_object(_p) );
    }
    uint64_t _p;
}

We can see, the pointer class requires globally access of the memory pool. This is usually implemented by make the memory pool to be singleton. However, Cuda do not allow static members, and it limits __device__ variables to be file scope. How can I workaround these limitations? Or I should try OpenCL?

19
  • Have you tried using a pinned allocation? Commented Oct 8, 2012 at 13:56
  • 1
    Singletons: Solving problems you never had. Commented Oct 8, 2012 at 14:00
  • 5
    Your class (singleton or not), can reside on the CPU, and own pointers to host and device memory allocated using normal cudaMalloc, etc. You can then retrieve device pointers (offset as needed) from the class at kernel invocation time and pass them to the kernel. I see no problem here... Commented Oct 9, 2012 at 1:22
  • 1
    @jiandingzhe: Could you please add your solution as an answer to this question? You will later be able to accept your own answer (this is allowed), and the question will be marked as solved, making it easier for other people to find by search. Thank you. Commented Aug 18, 2013 at 7:07
  • 1
    @jiandingzhe: it would be helpful if you accept the community wiki answer I have added to this question so that drops off the CUDA unanswered question list Commented Dec 26, 2016 at 19:35

1 Answer 1

1

The OP was able to solve this by wrapping a global scope __device__ variable using a static class method like this:

class FooBar;
__device__ FooBar* FOOBAR_DEVICE_POOL;
class FooBar
{
    __device__ static FooBar& DEVICE_GET(uint64_t p);
}

template<typename T>
class MemPoolPointer {
public:
    inline T* operator -> () const
    {
#ifdef __CUDA_ARCH__
        return &( T::DEVICE_GET(_p) );
#else
        return &( MemPool<T>::get_instance.get_object(_p) );
#endif
    }
    uint64_t _p;
}

[this answer added as a community wiki entry to get the question off the unanswered queue for the CUDA tag]

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.