0

I'm having some difficulty detecting text on the following type of image:

Image without preprocessing

It seems that tesseract has difficulty distinguishing the numbers from the diagrams. And my goal is to find every digits and their location.

From this image I run the following code which is supposed to give me rectangles around text found :

import cv2
import pytesseract
from pytesseract import Output
import numpy as np


pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract'

img = cv2.imread('Temp/VE_cropped.png')

kernel = np.ones((2,2),np.uint8)

img_processed = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
img_processed = cv2.medianBlur(img_processed,3)
img_processed = cv2.threshold(img_processed, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
img_processed = cv2.dilate(img_processed, kernel, iterations = 1)

dict_wordsDetected = pytesseract.image_to_data(img_processed, output_type=Output.DICT)
img_processed = cv2.cvtColor(img_processed, cv2.COLOR_GRAY2RGB)

n_boxes = len(dict_wordsDetected['text'])
for i in range(n_boxes):
    (x, y, w, h) = (dict_wordsDetected['left'][i]
                  , dict_wordsDetected['top'][i]
                  , dict_wordsDetected['width'][i]
                  , dict_wordsDetected['height'][i])
    img_processed = cv2.rectangle(img_processed, (x - 10, y - 10), (x + w + 10, y + h + 10), (0, 0, 255), 2)
cv2.imshow("processed", img_processed)
cv2.waitKey(0)

What gives us this result : Result

7
  • 1
    It does, but even with black on white. You just have to add : img_processed = cv2.threshold(img_processed, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1] AFTER img_processed = cv2.dilate(img_processed, kernel, iterations = 1) Commented Nov 17, 2021 at 15:44
  • It won't have any effect except moving rectangles around found text... Commented Nov 17, 2021 at 17:34
  • As I told, my problem is that tesseract doesn't recognize digits such as 0455, 0435 or 0453. The command you suggest is just to resize the red rectangles, but my problem is before drawing the rectangles. Commented Nov 17, 2021 at 17:49
  • Sorry my post is very clear, and needed these illustation. This is the kind of image I have to work with, so why not showing it ? Commented Nov 18, 2021 at 8:22
  • The box are supposed to be around the numbers. This is just to show that tesseract doesn't find numbers. This is why I ask for help here : to make tesseract find numbers, and I'll be able to find coordinates Commented Nov 18, 2021 at 8:49

1 Answer 1

3

I think that I understood what you wanted. First of all, Tesseract works well for many problems, especially when we see examples with images that are easily OCR'ed. That means, images without a complex background. In your case, the image is not simple enough to be treated using just Tesseract or image thresholding. You must do more image preprocessing to OCR your image. To solve your problem, you must clean your image, trying to obtain just the numbers. It can be hard work.

Recently, I was looking for a code to apply OCR to an image with a complex background. I found some solutions. The code that I'll show you is based on this solution.

To extract the number (or try), you must follow some steps

  • convert your image into the gray scale
  • apply image threshold using Otsu method and inverse operation
  • apply distance transform
  • apply morphological operation to clean up small points in your image
  • apply dilate operation to enlarge your numbers
  • find contours and filter them according the width and height of each contours
  • create a list of hull objects to each contour
  • draw the hull objects
  • using dilate operation in your mask
  • bitwise operation to retrieval the the segmented areas
  • OCR the pre-processed image
  • print out your results

The code that I present here is not perfect and, I think that it can be improved, but I want to show you a start point for your problem resolution.

import cv2
import pytesseract
from pytesseract import Output
import numpy as np
import imutils

# loading and resizing image
img = cv2.imread('ABV5H.png')
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
img = imutils.resize(img, width=900)
#gray scale
gray = cv2.cvtColor(img, cv2.COLOR_RGB2GRAY)
cv2.imshow("Gray", gray)
cv2.waitKey(0)
cv2.destroyAllWindows()

# thresholding with Otsu method and inverse operation
thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV | 
cv2.THRESH_OTSU)[1]
cv2.imshow("Threshold", thresh)
cv2.waitKey(0)
cv2.destroyAllWindows()

#distrance transform
dist = cv2.distanceTransform(thresh, cv2.DIST_L2, 5)
dist = cv2.normalize(dist, dist, 0, 1.0, cv2.NORM_MINMAX)
dist = (dist*255).astype('uint8')
dist = cv2.threshold(dist, 0, 255, cv2.THRESH_BINARY | cv2.THRESH_OTSU)[1]
cv2.imshow("Distance Transformation", dist)
cv2.waitKey(0)
cv2.destroyAllWindows()

# Morphological operation kernel (2,2) and OPEN method
kernel = cv2.getStructuringElement(cv2.MORPH_CROSS, (2,2))
opening = cv2.morphologyEx(dist, cv2.MORPH_OPEN, kernel)
cv2.imshow("Morphology", opening)
cv2.imwrite("morphology.jpg", opening)
cv2.waitKey(0)
cv2.destroyAllWindows()

#dilate operation to enlarge the numbers
kernel = cv2.getStructuringElement(cv2.MORPH_CROSS, (3,3))
dilation = cv2.dilate(opening, kernel, iterations = 1)
cv2.imshow("dilated", dilation)
cv2.imwrite("dilated.jpg", dilation)
cv2.waitKey(0)
cv2.destroyAllWindows()

#finding and grabbing the contours
cnts = cv2.findContours(dilation.copy(), cv2.RETR_EXTERNAL, 
cv2.CHAIN_APPROX_SIMPLE)
cnts = imutils.grab_contours(cnts)
output = img.copy()
for i in cnts:
    cv2.drawContours(output, [i], -1, (0, 0, 255), 3)
cv2.imshow("Contours", output)
cv2.imwrite("contours.jpg", dilation)
cv2.waitKey(0)
cv2.destroyAllWindows()

#filtering the contours
nums = []
output2 = img.copy()
for c in cnts:
    (x, y, w, h) = cv2.boundingRect(c)

    if w >= 5 and w < 75 and h > 15 and h <= 35:
        nums.append(c)
for i in nums:
    cv2.drawContours(output2, [i], -1, (0, 0, 255), 2)
cv2.imshow("Filter", output2)
cv2.imwrite("filter.jpg", output2)
cv2.waitKey(0)
cv2.destroyAllWindows()

# making a list with the hull points
hull = []
# calculate points for each contour
for i in range(len(nums)):
    # creating convex hull object for each contour
    hull.append(cv2.convexHull(nums[i], False))

# create an empty black image
mask = np.zeros(dilation.shape[:2], dtype='uint8')

# draw contours and hull points
for i in range(len(nums)):
    color = (255, 0, 0) # blue - color for convex hull
    # draw ith convex hull object
    cv2.drawContours(mask, hull, i, color, 1, 8)

#dilating the mask to have a proper image for bitwise
mask = cv2.dilate(mask, kernel, iterations = 15)
cv2.imshow("Dilated Mask", mask)
cv2.imwrite("dilated-mask.jpg", mask)
cv2.waitKey(0)
cv2.destroyAllWindows()

#bitwise operation
final = cv2.bitwise_and(dilation, dilation, mask=mask)
cv2.imshow("Pre-processed Image", final)
cv2.imwrite("pre-processed.jpg", final)
cv2.waitKey(0)
cv2.destroyAllWindows()


config = '--psm 12 -c tessedit_char_whitelist=0123456789' #page segmentation mode and white lists
#OCR'ing the image
dict_wordsDetected = pytesseract.image_to_data(final, config = config, 
output_type=Output.DICT)

#filtering the detections and making a list of index
index = []
for idx, txt in enumerate(dict_wordsDetected['text']):
    if len(txt) >= 1:
        dict_wordsDetected['text'][idx] = txt.replace(" ", "")
        index.append(idx)
    
for i in index:

    (x, y, w, h) = (dict_wordsDetected['left'][i]
                  , dict_wordsDetected['top'][i]
                  , dict_wordsDetected['width'][i]
                  , dict_wordsDetected['height'][i])
    img_processed = cv2.rectangle(img, (x - 10, y - 10), (x + w + 10, y + h + 10), (0, 0, 255), 2)
    text = "{}".format(dict_wordsDetected['text'][i])
    cv2.putText(img, text, (x, y - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.8, (0, 0, 255), 2)
cv2.imshow("Voilà le résultat", img)
cv2.imwrite('result.jpg', img)
cv2.waitKey(0)
cv2.destroyAllWindows()

Visualizing some operations

(I cannot upload my images for the moment. There are some hyperlinks with images. These images correspond to some image pre-processing steps)

Output image after dilation:

Output image after dilation

filtered contours:

filtered contours

Mask after the hull operation and dilation: Mask after the hull operation and dilation

pre-processed image (the image that will be OCR'ed:

pre-processed image (the image that will be OCR'ed)

The results Results

Results

As you can see, we can find numbers in the input image. We have good detection. On the other hand, we also have inaccurate outputs. The main reason is the image preprocessing. The image is noisy, even if we have performed some transformations. The key to your problem is image preprocessing. Another point you must keep in mind is that Tesseract is not perfect; it requires good images to work well. Beyond that, you must know the --psm modes (page segmentation) to improve your OCR, as well as using white lists to avoid undesirable detection. As I said, we have good results, but I guess you can improve them if your task requires just OpenCV and Tesseract. Because there are others that are way less complicated than this one.

Si tu as besoin d'aide, tu peux me contacter, je préfère parler français que l'anglais.

Sign up to request clarification or add additional context in comments.

2 Comments

Merci beaucoup pour ton aide! En effet j'aimerai bien pouvoir échanger avec toi sur ces pratiques, mais impossible d'envoyer de messages direct depuis StackOverFlow. Souhaites-tu que je t'envoie mon Linkedin?
De rien. Je suis en conversation vers l’intelligence artificielle, connaître ce genre de technique est fondamental pour travailler avec Computer Vision. Donc on peut se contacter pour quelques échanges, c’est toujours important de parler, on apprend beaucoup.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.