I'm trying to extract text from image using OpenCV and Tesseract. I've managed to detect the text areas and use bounding boxes to delimit them. But now I can't find how to pass the bounding boxes to Tesseract.
for(int idx = 0; idx >= 0; idx = hierarchy[idx][0])
{
Rect rect = boundingRect(contours[idx]);
Mat maskROI(mask, rect);
maskROI = Scalar(0, 0, 0);
// fill the contour
drawContours(mask, contours, idx, Scalar(255, 255, 255), CV_FILLED);
// ratio of non-zero pixels in the filled region
double r = (double)countNonZero(maskROI)/(rect.width*rect.height);
if (r > .45 /* assume at least 45% of the area is filled if it contains text */
&&
(rect.height > 8 && rect.width > 8) /* constraints on region size */
/* these two conditions alone are not very robust. better to use something
like the number of significant peaks in a horizontal projection as a third condition */
)
{
rectangle(rgb, rect, Scalar(0, 255, 0), 2);
}
}
imwrite(OUTPUT_FOLDER_PATH + string("/rgb.jpg"), rgb);
return 0;
}
I'm getting very good results with the bounding boxes. Image with bounding boxes:
And then tried cv::text::OCRTesseract::run but that doesn't seem to work.
Anyone has an idea?
EDIT: I had to remove most of the code because the company i'm in an internship with asked me to. But this is for my end of the year project so as soon as i end the year i will edit the post with a github link for the whole project.

tesseract->run(rgb(rect), output_string);.. or something like that