Getting better performance using OpenCV?

Question

I need real time processing, but the internal functions of OpenCV are not providing this. I am doing hand gesture recognition, and it works almost perfectly, except for the fact that the resulting output is VERY laggy and slow. I know that this isn't because of my algorithm but the processing times of OpenCV. Is there anything I can do to speed it up?

Ps: I don't want to use the IPP libraries so please don't suggest that. I need increased performance from OpenCV itself

i do not have the proficiency or time to do that That is exactly the reason you ask questions — fdh
– fdh, Commented Oct 1, 2011 at 20:39
sorry if I missed it, the point was 9 times out of 10, OpenCV code may not be the problem unless it is a new feature/bug. If you are on the other side, then you can specifically point the OpenCV code you are using. — vijiboy
– vijiboy, Commented Oct 3, 2011 at 5:10

JB King · Accepted Answer · 2011-10-11 22:10:07Z

11

Traditional techniques for improving image analysis:

Reduce the image to a monochrome sample.
Reduce the range of samples, e.g. from 8-bit monochrome to 4-bit monochrome.
Reduce the size of the image, e.g. 1024x1924 to 64x64.
Reduce the frame rate, e.g 60fps to 5fps.
Perform a higher level function to guess where the target area is with say a lower resolution, then perform the regular analysis on the cropped output, e.g. perform image recognition to locate the hand before determining the gesture.

edited Oct 11, 2011 at 22:10

JB King

11.9k4 gold badges41 silver badges49 bronze badges

answered Oct 1, 2011 at 23:12

Steve-o

12.9k2 gold badges44 silver badges61 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

fdh Over a year ago

1) I can't make the image monochrome as colour is important and vital 2) I can't reduce the depth as my algorithm relies on precision 3) I can't reduce size as my program relies on clarity 4) I need a larger image for the program to function properly 5) Could you please clarify/expand on this? Because I don't really understand what your suggesting

Chris Bennet Over a year ago

Steve-o speaks with wisdom. The key to getting good performance is to do fewer expensive operations as opposed to doing the expensive operations faster. For an example of suggestion #5 in practice; when I track my robot vehicle, I just image process the area near where I found the vehicle in the last frame instead of the whole frame.

Cameron Lowell Palmer Over a year ago

@fdh these are the same things that everyone tackling a computer vision problem runs into and the solutions are usually counter-intuitive. You get better performance with less data.

VoteCoffee · Accepted Answer · 2020-10-14 14:07:26Z

Steve-o's answer is good for optimizing your code efficiency. I recommend adding some logic to monitor execution times to help you identify where to spend efforts optimizing.

OpenCV logic for time monitoring (python):

startTime = cv.getTickCount()
# your code execution
time = (cv.getTickCount() - startTime)/ cv.getTickFrequency()

Boost logic for time monitoring:

boost::posix_time::ptime start = boost::posix_time::microsec_clock::local_time();
// do something time-consuming
boost::posix_time::ptime end = boost::posix_time::microsec_clock::local_time();

boost::posix_time::time_duration timeTaken = end - start;
std::cout << timeTaken << std::endl;

How you configure your OpenCV build matters a lot in my experience. IPP isn't the only option to give you better performance. It really is worth kicking the tires on your build to get better hardware utilization.

The other areas to look at are CPU and memory utilization. If you watch your CPU and/or memory utilization, you'll probably find that 10% of your code is working hard and the rest of the time things are largely idle.

Consider restructuring your logic as a pipeline using threads so that you can process multiple images at once (if you're tracking and need the results of previous images, you need to break up your code into multiple segments such as preprocessing/analysis and use a std::queue to buffer between them, and imshow won't work from worker threads so you'll need to push result images into a queue and imshow from the main thread)
Consider using persistent/global objects for things like kernels/detectors that don't need to get recreated each time
Is your throughput slowing down the longer your program runs? You may need to look at how you are handling disposing of images/variables within the main loop's scope
Segmenting your code in functions makes it more readable, easier to benchmark, and descopes variables earlier (temporary Mat and results variables free up memory when descoped)
If you're doing low-level processing on Mat pixels where you iterate over a large portion of the image, use a single parallel for and avoid writing
Depending on how you are running your code, you may be able to disable debugging to get better performance
If you're streaming and dumping frames, prefer changing the camera settings to throttle the streaming rate instead of dumping frames
If you're converting from1 12 to 8 bits or only using a region of your image, prefer doing this at the camera hardware level

Here's an example of a parallel for loop:

cv::parallel_for_(cv::Range(0, img.rows * img.cols), [&](const cv::Range& range)
{
    for (int r = range.start; r < range.end; r++)
    {
        int x = r / img.rows;
        int y = r % img.rows;
        uchar pixelVal = img.at<uchar>(y, x);
        //do work here
    }
});

If you're hardware constrained (ie fully utilizing CPU and/or memory), then you need to look at priotizing your process/OS perfomance optimizations/freeing system resources/upgrading your hardware

Increase the priority of the process to be more greedy with respect to other programs running on the computer (in linux you have nice(int inc) in unistd.h, in windows SetPriorityClass(GetCurrentProcess(), REALTIME_PRIORITY_CLASS) in windows.h)
Optimize your power settings for maximum performance in general
Disable CPU core parking
Optimize your acquisition hardware settings (increase rx/tx buffers, etc) to offload work from your CPU

Nuzhny · Accepted Answer · 2021-09-30 13:54:29Z

0

I'm using some approaches:

[Application level] For hardware with OpenCL support: from cv::Mat to cv::UMat and set cv::ocl::setUseOpenCL(true)
[Library level] In OpenCV CMake use another parallel library: TBB may be better then openmp
[Library level] In OpenCV CMake enable IPP support in OpenCV
[Application level] Caching temporary results. Most functions in OpenCV makes check format and size of output arrays. So you can store all results as cv::Mat in privete members and on next frames OpenCV will not allocate and deallocate memory for they.
[Library -> Application level] Put sources of bottle-neck OpenCV functions and apply for it punkt [4].

answered Sep 30, 2021 at 13:54

Nuzhny

1,9271 gold badge9 silver badges14 bronze badges

Collectives™ on Stack Overflow

Getting better performance using OpenCV?

3 Answers 3

3 Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

3 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related