OCR character recognition fails

Question

I am experimenting with AI and specifically character recognition. I saw that one of the best algorithms is OCR and Google's implementation in Tesseract seems like the best open source solution right now. So, I got some images and tried to apply Tesseract to them using python, the image in this case is a license plate so I wont show the original image, but after some preprocessing I end up with this: It seems like a pretty simple image to get text from but I always get 624830, so the 830 is fine but 624 should be GZX.

This is my code:

import pytesseract
import cv2

# Opening the image
img = cv2.imread("plate.jpg")
# Preprocess (Better performance)
w,h,_ = img.shape
img = cv2.resize(img, (w*2,h*2))
img = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
img = cv2.medianBlur(img,5)
img = cv2.threshold(img,0,255, cv2.THRESH_BINARY+cv2.THRESH_OTSU)[1]
# Get a slice of the image with just the text to improve performance
box = []
data = pytesseract.image_to_data(img, config=r'--oem 3', output_type=pytesseract.Output.DICT)
n_boxes = len(data['level'])
for i in range(n_boxes):
    (x, y, w, h) = (data['left'][i], data['top'][i], data['width'][i], data['height'][i])
    if int(data['conf'][i]) > 0:
        box.append((x, y, x + w, y + h))

h, w = img.shape
for box in box:
    x_min, y_min, x_max, y_max = box
    # Add some padding ensuring we do not get out of the image
    x_min_pad = max(0, x_min - 5)
    y_min_pad = max(0, y_min - 5)
    x_max_pad = min(w, x_max + 5)
    y_max_pad = min(h, y_max + 5)
    # Get the slice
    slice = img[y_min_pad:y_max_pad, x_min_pad:x_max_pad]
# Inference with LSTM on the slice
print(pytesseract.image_to_string(slice, config='--psm 8 --oem 3'))

All the preprocessing done was tested so I know that it does actually improve performance, if I remove it does not even detect the numbers right.

Hello @Alejandro, have you tried changing your image_to_string configuration ? this post stackoverflow.com/a/60161328/22258429 suggest using different parameters and manage to achieve accurate results. — Lrx
– Lrx, Commented Oct 8, 2024 at 12:59
Hi @Lrx, yes, I did try changing parameters in ìmage_to_string', I tried all combinations of psm and oem, from 0 to 8 and 0 to 3 respectively. I saw that oem only worked 1 and 3, the others gave me an error and psm 8 gave the best result. I did also try to do some changes to the threshold and blur functions like they did in that post and i managed to detect the G, but still no luck with Z and X. — Alejandro
– Alejandro, Commented Oct 9, 2024 at 5:56
Hi, I posted an answer, I don't consider it as a final answer, but rather some piece of advice — Lrx
– Lrx, Commented Oct 9, 2024 at 7:13

Lrx · Accepted Answer · 2024-10-09 07:12:34Z

The problem appears that the text you want to identify uses a stretched font (and is distorted due to perspective)

I manage to make it (kinda) work by squashing the sliced image down (i.e. keeping every other pixel row) before processing it, with the following parameters:

--psm 8 --oem 3

import pytesseract
import cv2

# Opening the image
img = cv2.imread("plate.jpg")
# Preprocess (Better performance)
w,h,_ = img.shape
img = cv2.resize(img, (w*2,h*2))
img = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
img = cv2.medianBlur(img,5)
img = cv2.threshold(img,0,255, cv2.THRESH_BINARY+cv2.THRESH_OTSU)[1]

# Get a slice of the image with just the text to improve performance
box = []
data = pytesseract.image_to_data(img, config=r'--oem 3', output_type=pytesseract.Output.DICT)
n_boxes = len(data['level'])
for i in range(n_boxes):
    (x, y, w, h) = (data['left'][i], data['top'][i], data['width'][i], data['height'][i])
    if int(data['conf'][i]) > 0:
        box.append((x, y, x + w, y + h))

h, w = img.shape
for box in box:
    x_min, y_min, x_max, y_max = box
    # Add some padding ensuring we do not get out of the image
    x_min_pad = max(0, x_min - 5)
    y_min_pad = max(0, y_min - 5)
    x_max_pad = min(w, x_max + 5)
    y_max_pad = min(h, y_max + 5)
    # Get the slice
    slice = img[y_min_pad:y_max_pad, x_min_pad:x_max_pad]

# Inference with LSTM on the slice
slice = slice[::2, :]
print(pytesseract.image_to_string(slice, config='--psm 8 --oem 3'))

The output is :

|GZX 830]

This is far from perfect considering we can see so residual characters on the side. I believe this will be even cleaner if you correct the perspective (by doing edge detection and standard use of cv2.getPerspectiveTransform and then cv2.warpPerspective)

Collectives™ on Stack Overflow

OCR character recognition fails

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related