1

thrust::binary_search segfaults for all but the default stream as far as I can tell. I can't find any information describing such a restriction in the documentation, so I'm hoping that an expert can enlighten me on proper usage.

Here is a simple example. This test code creates a vector of unsorted integers, copies to the device using a thrust vector. Then, it creates a stream and sorts using that stream. However if I attempt to specify an execution policy to a binary search routine on that stream, I get a seg fault. I need multiple streams to improve concurrency in a more complex case, of course.

#include <thrust/host_vector.h>
#include <thrust/device_vector.h>
#include <thrust/binary_search.h>
#include <iostream>
int main(void)
{
   std::vector<int> data = {31, 16, 14, 55, 61, 18, 33, 88, 72};
   thrust::host_vector<int> H(data);
   thrust::device_vector<int> D = H;

   cudaStream_t stream;
   cudaStreamCreate(&stream);

   thrust::sort(thrust::cuda::par.on(stream), D.begin(), D.end());
   // So far so good
   auto it1 = thrust::upper_bound(thrust::cuda::par, D.begin(), D.end(), 50);
   // Also good
   std::cout << "Test 1 = " << *it1 << std::endl;
   // But the next call seg faults
   auto it2 = thrust::upper_bound(thrust::cuda::par.on(stream), D.begin(), D.end(), 50);
   std::cout << "Test 2 = " << *it2 << std::endl; 
   cudaStreamDestroy(stream);
   return 0;
}

I'm using CUDA 9.1 on a compute capability 6.1 device.

Upper bound on the default stream works as expected. Upper bound on stream using the execution policy thrust::cuda::par.on(stream) seg faults. I can't find any wisdom about this in the documentation. Is this right? Is there a workaround?

3
  • I get the same result on CUDA 9.0 on a 7.0 capable device (V100). Program succeeds on CUDA 9.0 with thrust 1.8.3 on a 6.0 device (P100). Thrust 1.9 bug? Commented Jun 5, 2018 at 17:52
  • code also runs correctly on CUDA 8.0 and also fails on CUDA 9.2/V100. Appears to be a bug in thrust. I have filed an internal bug at NVIDIA. No further info at this time. Thanks for reporting. Commented Jun 5, 2018 at 21:27
  • Thank you folks for responding and filing that bug report. As much as I hate to reinvent the wheel, imperfectly, I spent the afternoon writing my own binary-search-based upper and lower bound routines using my own kernel plus thrust::transform. thrust::sort and thrust::transform are working fine with streams. Commented Jun 5, 2018 at 22:26

2 Answers 2

1

I'm the maintainer of Thrust. This is an unfortunate bug from before my time due to an oversight in the new CUDA backend for Thrust introduced in CUDA 9.0. The TL;DR is that the new CUDA backend doesn't have specializations of any of the binary search algorithms, so the generic sequential fallback is used. For some reason, the generic fallback implementation explodes when a stream execution policy is passed through.

I'm looking into the cause of the second problem, but the bigger concern is the first problem (no implementation of binary search algorithms in the new backend). The fix won't make it into the next CUDA release, but it will hopefully be in the release after that. However, after the next CUDA release, the Thrust GitHub will be back in service, and I'll be able to deploy a fix through there.

Unfortunately at this time, I have no other workaround.

GitHub Issue 921 is tracking this bug.

Sign up to request clarification or add additional context in comments.

Comments

1

Just to follow up -- this bug is fixed by https://github.com/thrust/thrust/pull/1104.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.