1

I am looking for solutions to use a memory pool within thrust as I want to limit the number of calls to cudaMalloc. device_vector definitely accepts an allocator, but it's not so easy to deal with thrust::sort which apparently will allocate a temporary buffer.

Based on the answer to How to use CUDA Thrust execution policy to override Thrust's low-level device memory allocator it seems that Thrust can be hooked to use special allocators by tweaking the execution policy, but it's quite old and I can't seem to find any doc about execution policies that explain how to proceed exactly.

For completeness, there is thrust/examples/cuda/custom_temporary_allocation.cu, but it's not very satisfying as it's using a memory pool hooked as a global variable.

I think it would be quite likely that the Thrust developer have thought about that, and would have included some mechanism to allow injecting a custom memory manager within the exec policy, I just can't find it.

5
  • Not sure why you looked at and linked the version of that example from 2012. The newest version of it (2 years old) does not use a global variable. Commented Jul 1, 2023 at 13:11
  • While the sample is using a basic custom pool, there are proper pool memory resources in Thrust as well. Commented Jul 1, 2023 at 13:14
  • For more control I would recommend taking a look at the RAPIDS Memory Manager (RMM) (RAPIDS is part of Nvidia and nowadays the main driver of innovation in Thrust and CUB, I think) or using CUB directly (it is basically the CUDA backend of Thrust), i.e. cub::DeviceRadixSort (for primitive types) or cub::DeviceMergeSort (generally applicable). Commented Jul 1, 2023 at 13:19
  • see github.com/ingowald/cudaKDTree/pull/7 for my changes @paleonix it would be great if you could give it a quick review (it's a pretty small PR) Commented Jul 3, 2023 at 23:10
  • Wrong comment thread? Either way, you will find thrust/examples/mr_basic.cu to be of interest in terms of allocators in Thrust. Commented Jul 3, 2023 at 23:32

1 Answer 1

2

The following is an example allocator for stream-ordered memory allocation that uses cudaMallocAsync to allocate from the default cuda memory pool on a specific stream. Together with the par_nosync execution policy, this allows for fully asynchronous thrust::sort.

#include <thrust/device_malloc_allocator.h>

template <class T>
struct ThrustAllocatorAsync : public thrust::device_malloc_allocator<T> {
public:
    using Base      = thrust::device_malloc_allocator<T>;
    using pointer   = typename Base::pointer;
    using size_type = typename Base::size_type;

    ThrustAllocatorAsync(cudaStream_t stream_) : stream{stream_} {}

    pointer allocate(size_type num){
        T* result = nullptr;
        cudaMallocAsync(&result, sizeof(T) * num, stream);
        return thrust::device_pointer_cast(result);
    }

    void deallocate(pointer ptr, size_type num){
        cudaFreeAsync(thrust::raw_pointer_cast(ptr), stream);
    }

private:
    cudaStream_t stream;
};

...

thrust::sort(
   thrust::cuda::par_nosync(ThrustAllocatorAsync<char>(stream)).on(stream),
   data.begin(),
   data.end()
);

The same can be achieved with RMM as suggested in the comments.

#include <rmm/mr/device/cuda_async_memory_resource.hpp> 
#include <rmm/exec_policy.hpp>

...
// could use any other class derived from rmm::mr::device_memory_resource
rmm::mr::cuda_async_memory_resource mr; 

thrust::sort(
   rmm::exec_policy_nosync(stream, &mr),
   data.begin(),
   data.end()
);

Sign up to request clarification or add additional context in comments.

1 Comment

this is great, thanks. I am reading the exec_policy code but I'm having troubles getting the hang of it. Is there a way to get the allocator out of the policy? or the stream? I'd like to simplify the arguments to my function to use the exec policy to allocate the memory to a device_buffer, and also for the sort and sync operations

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.