thrust::binary_search segfaults for all but the default stream as far as I can tell. I can't find any information describing such a restriction in the documentation, so I'm hoping that an expert can enlighten me on proper usage.
Here is a simple example. This test code creates a vector of unsorted integers, copies to the device using a thrust vector. Then, it creates a stream and sorts using that stream. However if I attempt to specify an execution policy to a binary search routine on that stream, I get a seg fault. I need multiple streams to improve concurrency in a more complex case, of course.
#include <thrust/host_vector.h>
#include <thrust/device_vector.h>
#include <thrust/binary_search.h>
#include <iostream>
int main(void)
{
std::vector<int> data = {31, 16, 14, 55, 61, 18, 33, 88, 72};
thrust::host_vector<int> H(data);
thrust::device_vector<int> D = H;
cudaStream_t stream;
cudaStreamCreate(&stream);
thrust::sort(thrust::cuda::par.on(stream), D.begin(), D.end());
// So far so good
auto it1 = thrust::upper_bound(thrust::cuda::par, D.begin(), D.end(), 50);
// Also good
std::cout << "Test 1 = " << *it1 << std::endl;
// But the next call seg faults
auto it2 = thrust::upper_bound(thrust::cuda::par.on(stream), D.begin(), D.end(), 50);
std::cout << "Test 2 = " << *it2 << std::endl;
cudaStreamDestroy(stream);
return 0;
}
I'm using CUDA 9.1 on a compute capability 6.1 device.
Upper bound on the default stream works as expected. Upper bound on stream using the execution policy thrust::cuda::par.on(stream) seg faults. I can't find any wisdom about this in the documentation. Is this right? Is there a workaround?