Difficulty detecting digits with tesseract

Question

I'm having some difficulty detecting text on the following type of image:

It seems that tesseract has difficulty distinguishing the numbers from the diagrams. And my goal is to find every digits and their location.

From this image I run the following code which is supposed to give me rectangles around text found :

import cv2
import pytesseract
from pytesseract import Output
import numpy as np


pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract'

img = cv2.imread('Temp/VE_cropped.png')

kernel = np.ones((2,2),np.uint8)

img_processed = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
img_processed = cv2.medianBlur(img_processed,3)
img_processed = cv2.threshold(img_processed, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
img_processed = cv2.dilate(img_processed, kernel, iterations = 1)

dict_wordsDetected = pytesseract.image_to_data(img_processed, output_type=Output.DICT)
img_processed = cv2.cvtColor(img_processed, cv2.COLOR_GRAY2RGB)

n_boxes = len(dict_wordsDetected['text'])
for i in range(n_boxes):
    (x, y, w, h) = (dict_wordsDetected['left'][i]
                  , dict_wordsDetected['top'][i]
                  , dict_wordsDetected['width'][i]
                  , dict_wordsDetected['height'][i])
    img_processed = cv2.rectangle(img_processed, (x - 10, y - 10), (x + w + 10, y + h + 10), (0, 0, 255), 2)
cv2.imshow("processed", img_processed)
cv2.waitKey(0)

What gives us this result :

It does, but even with black on white. You just have to add : img_processed = cv2.threshold(img_processed, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1] AFTER img_processed = cv2.dilate(img_processed, kernel, iterations = 1) — stathamos
– stathamos, Commented Nov 17, 2021 at 15:44
It won't have any effect except moving rectangles around found text... — stathamos
– stathamos, Commented Nov 17, 2021 at 17:34
As I told, my problem is that tesseract doesn't recognize digits such as 0455, 0435 or 0453. The command you suggest is just to resize the red rectangles, but my problem is before drawing the rectangles. — stathamos
– stathamos, Commented Nov 17, 2021 at 17:49
Sorry my post is very clear, and needed these illustation. This is the kind of image I have to work with, so why not showing it ? — stathamos
– stathamos, Commented Nov 18, 2021 at 8:22
The box are supposed to be around the numbers. This is just to show that tesseract doesn't find numbers. This is why I ask for help here : to make tesseract find numbers, and I'll be able to find coordinates — stathamos
– stathamos, Commented Nov 18, 2021 at 8:49

Igor Melo · Accepted Answer · 2022-08-17 04:05:42Z

I think that I understood what you wanted. First of all, Tesseract works well for many problems, especially when we see examples with images that are easily OCR'ed. That means, images without a complex background. In your case, the image is not simple enough to be treated using just Tesseract or image thresholding. You must do more image preprocessing to OCR your image. To solve your problem, you must clean your image, trying to obtain just the numbers. It can be hard work.

Recently, I was looking for a code to apply OCR to an image with a complex background. I found some solutions. The code that I'll show you is based on this solution.

To extract the number (or try), you must follow some steps

convert your image into the gray scale
apply image threshold using Otsu method and inverse operation
apply distance transform
apply morphological operation to clean up small points in your image
apply dilate operation to enlarge your numbers
find contours and filter them according the width and height of each contours
create a list of hull objects to each contour
draw the hull objects
using dilate operation in your mask
bitwise operation to retrieval the the segmented areas
OCR the pre-processed image
print out your results

The code that I present here is not perfect and, I think that it can be improved, but I want to show you a start point for your problem resolution.

import cv2
import pytesseract
from pytesseract import Output
import numpy as np
import imutils

# loading and resizing image
img = cv2.imread('ABV5H.png')
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
img = imutils.resize(img, width=900)
#gray scale
gray = cv2.cvtColor(img, cv2.COLOR_RGB2GRAY)
cv2.imshow("Gray", gray)
cv2.waitKey(0)
cv2.destroyAllWindows()

# thresholding with Otsu method and inverse operation
thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV | 
cv2.THRESH_OTSU)[1]
cv2.imshow("Threshold", thresh)
cv2.waitKey(0)
cv2.destroyAllWindows()

#distrance transform
dist = cv2.distanceTransform(thresh, cv2.DIST_L2, 5)
dist = cv2.normalize(dist, dist, 0, 1.0, cv2.NORM_MINMAX)
dist = (dist*255).astype('uint8')
dist = cv2.threshold(dist, 0, 255, cv2.THRESH_BINARY | cv2.THRESH_OTSU)[1]
cv2.imshow("Distance Transformation", dist)
cv2.waitKey(0)
cv2.destroyAllWindows()

# Morphological operation kernel (2,2) and OPEN method
kernel = cv2.getStructuringElement(cv2.MORPH_CROSS, (2,2))
opening = cv2.morphologyEx(dist, cv2.MORPH_OPEN, kernel)
cv2.imshow("Morphology", opening)
cv2.imwrite("morphology.jpg", opening)
cv2.waitKey(0)
cv2.destroyAllWindows()

#dilate operation to enlarge the numbers
kernel = cv2.getStructuringElement(cv2.MORPH_CROSS, (3,3))
dilation = cv2.dilate(opening, kernel, iterations = 1)
cv2.imshow("dilated", dilation)
cv2.imwrite("dilated.jpg", dilation)
cv2.waitKey(0)
cv2.destroyAllWindows()

#finding and grabbing the contours
cnts = cv2.findContours(dilation.copy(), cv2.RETR_EXTERNAL, 
cv2.CHAIN_APPROX_SIMPLE)
cnts = imutils.grab_contours(cnts)
output = img.copy()
for i in cnts:
    cv2.drawContours(output, [i], -1, (0, 0, 255), 3)
cv2.imshow("Contours", output)
cv2.imwrite("contours.jpg", dilation)
cv2.waitKey(0)
cv2.destroyAllWindows()

#filtering the contours
nums = []
output2 = img.copy()
for c in cnts:
    (x, y, w, h) = cv2.boundingRect(c)

    if w >= 5 and w < 75 and h > 15 and h <= 35:
        nums.append(c)
for i in nums:
    cv2.drawContours(output2, [i], -1, (0, 0, 255), 2)
cv2.imshow("Filter", output2)
cv2.imwrite("filter.jpg", output2)
cv2.waitKey(0)
cv2.destroyAllWindows()

# making a list with the hull points
hull = []
# calculate points for each contour
for i in range(len(nums)):
    # creating convex hull object for each contour
    hull.append(cv2.convexHull(nums[i], False))

# create an empty black image
mask = np.zeros(dilation.shape[:2], dtype='uint8')

# draw contours and hull points
for i in range(len(nums)):
    color = (255, 0, 0) # blue - color for convex hull
    # draw ith convex hull object
    cv2.drawContours(mask, hull, i, color, 1, 8)

#dilating the mask to have a proper image for bitwise
mask = cv2.dilate(mask, kernel, iterations = 15)
cv2.imshow("Dilated Mask", mask)
cv2.imwrite("dilated-mask.jpg", mask)
cv2.waitKey(0)
cv2.destroyAllWindows()

#bitwise operation
final = cv2.bitwise_and(dilation, dilation, mask=mask)
cv2.imshow("Pre-processed Image", final)
cv2.imwrite("pre-processed.jpg", final)
cv2.waitKey(0)
cv2.destroyAllWindows()


config = '--psm 12 -c tessedit_char_whitelist=0123456789' #page segmentation mode and white lists
#OCR'ing the image
dict_wordsDetected = pytesseract.image_to_data(final, config = config, 
output_type=Output.DICT)

#filtering the detections and making a list of index
index = []
for idx, txt in enumerate(dict_wordsDetected['text']):
    if len(txt) >= 1:
        dict_wordsDetected['text'][idx] = txt.replace(" ", "")
        index.append(idx)
    
for i in index:

    (x, y, w, h) = (dict_wordsDetected['left'][i]
                  , dict_wordsDetected['top'][i]
                  , dict_wordsDetected['width'][i]
                  , dict_wordsDetected['height'][i])
    img_processed = cv2.rectangle(img, (x - 10, y - 10), (x + w + 10, y + h + 10), (0, 0, 255), 2)
    text = "{}".format(dict_wordsDetected['text'][i])
    cv2.putText(img, text, (x, y - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.8, (0, 0, 255), 2)
cv2.imshow("Voilà le résultat", img)
cv2.imwrite('result.jpg', img)
cv2.waitKey(0)
cv2.destroyAllWindows()

Visualizing some operations

(I cannot upload my images for the moment. There are some hyperlinks with images. These images correspond to some image pre-processing steps)

Output image after dilation:

Output image after dilation

filtered contours:

filtered contours

Mask after the hull operation and dilation:

pre-processed image (the image that will be OCR'ed:

pre-processed image (the image that will be OCR'ed)

The results Results

Results

As you can see, we can find numbers in the input image. We have good detection. On the other hand, we also have inaccurate outputs. The main reason is the image preprocessing. The image is noisy, even if we have performed some transformations. The key to your problem is image preprocessing. Another point you must keep in mind is that Tesseract is not perfect; it requires good images to work well. Beyond that, you must know the --psm modes (page segmentation) to improve your OCR, as well as using white lists to avoid undesirable detection. As I said, we have good results, but I guess you can improve them if your task requires just OpenCV and Tesseract. Because there are others that are way less complicated than this one.

Si tu as besoin d'aide, tu peux me contacter, je préfère parler français que l'anglais.

Merci beaucoup pour ton aide! En effet j'aimerai bien pouvoir échanger avec toi sur ces pratiques, mais impossible d'envoyer de messages direct depuis StackOverFlow. Souhaites-tu que je t'envoie mon Linkedin?
De rien. Je suis en conversation vers l’intelligence artificielle, connaître ce genre de technique est fondamental pour travailler avec Computer Vision. Donc on peut se contacter pour quelques échanges, c’est toujours important de parler, on apprend beaucoup.

Collectives™ on Stack Overflow

Difficulty detecting digits with tesseract

1 Answer 1

Visualizing some operations

Results

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Visualizing some operations

Results

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related