4

I need real time processing, but the internal functions of OpenCV are not providing this. I am doing hand gesture recognition, and it works almost perfectly, except for the fact that the resulting output is VERY laggy and slow. I know that this isn't because of my algorithm but the processing times of OpenCV. Is there anything I can do to speed it up?

Ps: I don't want to use the IPP libraries so please don't suggest that. I need increased performance from OpenCV itself

3
  • 1
    profile the openCV code and see for yourself... Commented Oct 1, 2011 at 20:19
  • i do not have the proficiency or time to do that That is exactly the reason you ask questions Commented Oct 1, 2011 at 20:39
  • 2
    sorry if I missed it, the point was 9 times out of 10, OpenCV code may not be the problem unless it is a new feature/bug. If you are on the other side, then you can specifically point the OpenCV code you are using. Commented Oct 3, 2011 at 5:10

3 Answers 3

11

Traditional techniques for improving image analysis:

  1. Reduce the image to a monochrome sample.
  2. Reduce the range of samples, e.g. from 8-bit monochrome to 4-bit monochrome.
  3. Reduce the size of the image, e.g. 1024x1924 to 64x64.
  4. Reduce the frame rate, e.g 60fps to 5fps.
  5. Perform a higher level function to guess where the target area is with say a lower resolution, then perform the regular analysis on the cropped output, e.g. perform image recognition to locate the hand before determining the gesture.
Sign up to request clarification or add additional context in comments.

3 Comments

1) I can't make the image monochrome as colour is important and vital 2) I can't reduce the depth as my algorithm relies on precision 3) I can't reduce size as my program relies on clarity 4) I need a larger image for the program to function properly 5) Could you please clarify/expand on this? Because I don't really understand what your suggesting
Steve-o speaks with wisdom. The key to getting good performance is to do fewer expensive operations as opposed to doing the expensive operations faster. For an example of suggestion #5 in practice; when I track my robot vehicle, I just image process the area near where I found the vehicle in the last frame instead of the whole frame.
@fdh these are the same things that everyone tackling a computer vision problem runs into and the solutions are usually counter-intuitive. You get better performance with less data.
1

Steve-o's answer is good for optimizing your code efficiency. I recommend adding some logic to monitor execution times to help you identify where to spend efforts optimizing.

OpenCV logic for time monitoring (python):

startTime = cv.getTickCount()
# your code execution
time = (cv.getTickCount() - startTime)/ cv.getTickFrequency()

Boost logic for time monitoring:

boost::posix_time::ptime start = boost::posix_time::microsec_clock::local_time();
// do something time-consuming
boost::posix_time::ptime end = boost::posix_time::microsec_clock::local_time();

boost::posix_time::time_duration timeTaken = end - start;
std::cout << timeTaken << std::endl;

How you configure your OpenCV build matters a lot in my experience. IPP isn't the only option to give you better performance. It really is worth kicking the tires on your build to get better hardware utilization.

The other areas to look at are CPU and memory utilization. If you watch your CPU and/or memory utilization, you'll probably find that 10% of your code is working hard and the rest of the time things are largely idle.

  • Consider restructuring your logic as a pipeline using threads so that you can process multiple images at once (if you're tracking and need the results of previous images, you need to break up your code into multiple segments such as preprocessing/analysis and use a std::queue to buffer between them, and imshow won't work from worker threads so you'll need to push result images into a queue and imshow from the main thread)
  • Consider using persistent/global objects for things like kernels/detectors that don't need to get recreated each time
  • Is your throughput slowing down the longer your program runs? You may need to look at how you are handling disposing of images/variables within the main loop's scope
  • Segmenting your code in functions makes it more readable, easier to benchmark, and descopes variables earlier (temporary Mat and results variables free up memory when descoped)
  • If you're doing low-level processing on Mat pixels where you iterate over a large portion of the image, use a single parallel for and avoid writing
  • Depending on how you are running your code, you may be able to disable debugging to get better performance
  • If you're streaming and dumping frames, prefer changing the camera settings to throttle the streaming rate instead of dumping frames
  • If you're converting from1 12 to 8 bits or only using a region of your image, prefer doing this at the camera hardware level

Here's an example of a parallel for loop:

cv::parallel_for_(cv::Range(0, img.rows * img.cols), [&](const cv::Range& range)
{
    for (int r = range.start; r < range.end; r++)
    {
        int x = r / img.rows;
        int y = r % img.rows;
        uchar pixelVal = img.at<uchar>(y, x);
        //do work here
    }
});

If you're hardware constrained (ie fully utilizing CPU and/or memory), then you need to look at priotizing your process/OS perfomance optimizations/freeing system resources/upgrading your hardware

  • Increase the priority of the process to be more greedy with respect to other programs running on the computer (in linux you have nice(int inc) in unistd.h, in windows SetPriorityClass(GetCurrentProcess(), REALTIME_PRIORITY_CLASS) in windows.h)
  • Optimize your power settings for maximum performance in general
  • Disable CPU core parking
  • Optimize your acquisition hardware settings (increase rx/tx buffers, etc) to offload work from your CPU

Comments

0

I'm using some approaches:

  1. [Application level] For hardware with OpenCL support: from cv::Mat to cv::UMat and set cv::ocl::setUseOpenCL(true)
  2. [Library level] In OpenCV CMake use another parallel library: TBB may be better then openmp
  3. [Library level] In OpenCV CMake enable IPP support in OpenCV
  4. [Application level] Caching temporary results. Most functions in OpenCV makes check format and size of output arrays. So you can store all results as cv::Mat in privete members and on next frames OpenCV will not allocate and deallocate memory for they.
  5. [Library -> Application level] Put sources of bottle-neck OpenCV functions and apply for it punkt [4].

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.