1

I have a base VS2012 C++/CLI application that pulls images from more than one webcam and then does object recognition on them using OpenCV. Currently, each webcam has its own thread. This works but I am not getting the frame rate I would like.

I would like to implement some of the code on my NVidia GPU. Thus, I want each of my CPU threads to be able to asynchronously call the GPU and perform a series of functions.

I am a GPU newbie so I am wondering which makes more sense:

1) locking access to the GPU to one CPU thread at a time; or

2) something where each CPU thread can make calls to the GPU and either only certain cores work on that thread (and other cores work on other threads); or

3) something where the jobs are cached.

If #2 is a good option, is there some sort of guide on how to do it? I would need to keep somethings in the GPU memory specific to each CPU thread.

Thanks for any guidance.

5
  • I would suggest just let it rip (i.e. maybe none of the above). As long as you don't use up all the memory on the GPU, there's little harm in asynchronously, more or less in an unmanaged fashion, let multiple threads use the GPU willy-nilly (assuming each thread is more or less working independently, e.g. one thread per webcam as you say). Given that you have something working in OpenCV, the first step might be to ask what parts of your algorithm may already be able to take advantage of an OpenCV GPU accelerated function. Not the only way to GPU accelerate, but it might be low-hanging fruit. Commented Jul 28, 2015 at 20:02
  • Thanks for the suggestion. I assume then that one CPU thread trying to call a GPU command while a GPU command is already running will not cause the thread fault. Commented Jul 28, 2015 at 21:52
  • I know of the OpenCV GPU accelerated functions but, given that I need to run several functions on the GPU (e.g., remap, cvtColor, surfkeypoints, findhomography) for each image from the webcam, I assumed that things would be much faster if the whole thing was done on the GPU. Commented Jul 28, 2015 at 21:54
  • Correct. As long as you don't do something like over-allocate the available memory, activity requested of the GPU just goes into a queue and will get done at whatever the rate is that can be supported. It doesn't matter if the activity emanates from a single thread or multiple threads. The advantage of letting things rip is that you may get higher utilization of the GPU that way. Basically the GPU likes to be over-subscribed. And you're also correct that you'll likely benefit by moving as much of the pipeline to the GPU as possible. But this should still be possible with OpenCV and gpu::GpuMat Commented Jul 28, 2015 at 22:24
  • Great. Thanks so much. I will try it and see what happens. Currently, on full HD images, I am getting 0.2 fps and I am hoping to get that over 1 fps. Commented Jul 28, 2015 at 22:40

1 Answer 1

1

A Next step could be to use CUDA-Streams. So independent Threads can run concurrently on the GPU. But be carefull, with that it's getting more complex to have a look over the used Memory, Registers and Cores. That would mean you need additional code to avoid overallocating or running out of registers. Anytime keep in mind, that every resource on GPU is limited. For Kepler take a look in the Kepler Whitepaper on page 6 - 8 and in the Performance Guidelines in the CUDA Programming Guide on page 79 and 80.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.