Multithreaded CPU to GPU Using CUDA 7.0/VS2012 C++/CLI/OpenCV

Question

I have a base VS2012 C++/CLI application that pulls images from more than one webcam and then does object recognition on them using OpenCV. Currently, each webcam has its own thread. This works but I am not getting the frame rate I would like.

I would like to implement some of the code on my NVidia GPU. Thus, I want each of my CPU threads to be able to asynchronously call the GPU and perform a series of functions.

I am a GPU newbie so I am wondering which makes more sense:

1) locking access to the GPU to one CPU thread at a time; or

2) something where each CPU thread can make calls to the GPU and either only certain cores work on that thread (and other cores work on other threads); or

3) something where the jobs are cached.

If #2 is a good option, is there some sort of guide on how to do it? I would need to keep somethings in the GPU memory specific to each CPU thread.

Thanks for any guidance.

I would suggest just let it rip (i.e. maybe none of the above). As long as you don't use up all the memory on the GPU, there's little harm in asynchronously, more or less in an unmanaged fashion, let multiple threads use the GPU willy-nilly (assuming each thread is more or less working independently, e.g. one thread per webcam as you say). Given that you have something working in OpenCV, the first step might be to ask what parts of your algorithm may already be able to take advantage of an OpenCV GPU accelerated function. Not the only way to GPU accelerate, but it might be low-hanging fruit. — Robert Crovella
– Robert Crovella, Commented Jul 28, 2015 at 20:02
Thanks for the suggestion. I assume then that one CPU thread trying to call a GPU command while a GPU command is already running will not cause the thread fault. — user1805103
– user1805103, Commented Jul 28, 2015 at 21:52
I know of the OpenCV GPU accelerated functions but, given that I need to run several functions on the GPU (e.g., remap, cvtColor, surfkeypoints, findhomography) for each image from the webcam, I assumed that things would be much faster if the whole thing was done on the GPU. — user1805103
– user1805103, Commented Jul 28, 2015 at 21:54
Correct. As long as you don't do something like over-allocate the available memory, activity requested of the GPU just goes into a queue and will get done at whatever the rate is that can be supported. It doesn't matter if the activity emanates from a single thread or multiple threads. The advantage of letting things rip is that you may get higher utilization of the GPU that way. Basically the GPU likes to be over-subscribed. And you're also correct that you'll likely benefit by moving as much of the pipeline to the GPU as possible. But this should still be possible with OpenCV and gpu::GpuMat — Robert Crovella
– Robert Crovella, Commented Jul 28, 2015 at 22:24
Great. Thanks so much. I will try it and see what happens. Currently, on full HD images, I am getting 0.2 fps and I am hoping to get that over 1 fps. — user1805103
– user1805103, Commented Jul 28, 2015 at 22:40

Noran · Accepted Answer · 2015-07-29 10:37:14Z

1

A Next step could be to use CUDA-Streams. So independent Threads can run concurrently on the GPU. But be carefull, with that it's getting more complex to have a look over the used Memory, Registers and Cores. That would mean you need additional code to avoid overallocating or running out of registers. Anytime keep in mind, that every resource on GPU is limited. For Kepler take a look in the Kepler Whitepaper on page 6 - 8 and in the Performance Guidelines in the CUDA Programming Guide on page 79 and 80.

answered Jul 29, 2015 at 10:37

Noran

541 silver badge9 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Multithreaded CPU to GPU Using CUDA 7.0/VS2012 C++/CLI/OpenCV

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related