3

I'm trying to extract text from image using OpenCV and Tesseract. I've managed to detect the text areas and use bounding boxes to delimit them. But now I can't find how to pass the bounding boxes to Tesseract.

        for(int idx = 0; idx >= 0; idx = hierarchy[idx][0])
        {
            Rect rect = boundingRect(contours[idx]);
            Mat maskROI(mask, rect);
            maskROI = Scalar(0, 0, 0);
            // fill the contour
            drawContours(mask, contours, idx, Scalar(255, 255, 255), CV_FILLED);
            // ratio of non-zero pixels in the filled region
            double r = (double)countNonZero(maskROI)/(rect.width*rect.height);

            if (r > .45 /* assume at least 45% of the area is filled if it contains text */
                &&
                (rect.height > 8 && rect.width > 8) /* constraints on region size */
                /* these two conditions alone are not very robust. better to use something
                 like the number of significant peaks in a horizontal projection as a third condition */
                )
            {
                rectangle(rgb, rect, Scalar(0, 255, 0), 2);
            }
        }
        imwrite(OUTPUT_FOLDER_PATH + string("/rgb.jpg"), rgb);
    return 0;
    }

I'm getting very good results with the bounding boxes. Image with bounding boxes:

enter image description here

And then tried cv::text::OCRTesseract::run but that doesn't seem to work.

Anyone has an idea?

EDIT: I had to remove most of the code because the company i'm in an internship with asked me to. But this is for my end of the year project so as soon as i end the year i will edit the post with a github link for the whole project.

13
  • Can't you pass the cropped images? Commented Apr 7, 2016 at 16:27
  • there is no cropped image in this code. It only detects the text regions and then contours them/ Commented Apr 7, 2016 at 16:34
  • Yes, I see... Can't you crop the image on each rectangle, and pass each crop to tesseract? Commented Apr 7, 2016 at 16:35
  • That's exactly what i'm searching to do. i have not found any documentation or example to help me do that. Most documentations say that you can pass the bounding boxes as a parameter to Tesseract in openCv but i can't find how to do it Commented Apr 7, 2016 at 16:40
  • You can retrieve the rectangles from OpenCV tesseract, not use them as input. When your have a good detection, just call tesseract->run(rgb(rect), output_string);.. or something like that Commented Apr 7, 2016 at 16:46

2 Answers 2

5

First, thanks to miki for the help. This is what i did to fix this issue.

  1. Crop the Original image for every bounding box. this will give me seperate images for the many text areas in the image.To do this, just put Mat cropedImage = small(Rect(rect)); under this line rectangle(rgb, rect, Scalar(0, 255, 0), 2);

  2. Make an instance of OCRTesseract class and initialise the tesseract engine. TO do this, add this line Ptr<cv::text::OCRTesseract> tess = cv::text::OCRTesseract::create(NULL,NULL,NULL,3,3); (preferably before your main but you can put it anywhere as long it is before the for loop in this code). The parameter are NOT mandatory so you can just put Ptr<cv::text::OCRTesseract> tess = cv::text::OCRTesseract::create();.

    1. Now that you have your engine. You can run the OCR. You can run it with many parameters but i'm going to stick with the basic one: the input image and the output text. So you can now add this line tess->run(cropedImage, output_string); just below this one Mat cropedImage = small(Rect(rect));

Please note that it is preferable to process the cropped images before passing them to the OCR (Thresholding to binary image, enlarge the crop so the text doesn't touch the edge)

Sign up to request clarification or add additional context in comments.

Comments

0

You need the OpenCV extra modules before you can use cv::text::OCRTesseract::run. You can download the same from here.

The tutorial at the bottom of that page will tell you how to install them on linux, to use with your OpenCV. From what I remember though, you need to build them during your installation of OpenCV. Also these modules are only for OpenCV3.

For windows instructions, look here.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.