I am looking for solutions to use a memory pool within thrust as I want to limit the number of calls to cudaMalloc.
device_vector definitely accepts an allocator, but it's not so easy to deal with thrust::sort which apparently will allocate a temporary buffer.
Based on the answer to How to use CUDA Thrust execution policy to override Thrust's low-level device memory allocator it seems that Thrust can be hooked to use special allocators by tweaking the execution policy, but it's quite old and I can't seem to find any doc about execution policies that explain how to proceed exactly.
For completeness, there is thrust/examples/cuda/custom_temporary_allocation.cu, but it's not very satisfying as it's using a memory pool hooked as a global variable.
I think it would be quite likely that the Thrust developer have thought about that, and would have included some mechanism to allow injecting a custom memory manager within the exec policy, I just can't find it.
cub::DeviceRadixSort(for primitive types) orcub::DeviceMergeSort(generally applicable).thrust/examples/mr_basic.cuto be of interest in terms of allocators in Thrust.