2

I am experimenting with AI and specifically character recognition. I saw that one of the best algorithms is OCR and Google's implementation in Tesseract seems like the best open source solution right now. So, I got some images and tried to apply Tesseract to them using python, the image in this case is a license plate so I wont show the original image, but after some preprocessing I end up with this: Example of license plate after preproccess It seems like a pretty simple image to get text from but I always get 624830, so the 830 is fine but 624 should be GZX.

This is my code:

import pytesseract
import cv2

# Opening the image
img = cv2.imread("plate.jpg")
# Preprocess (Better performance)
w,h,_ = img.shape
img = cv2.resize(img, (w*2,h*2))
img = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
img = cv2.medianBlur(img,5)
img = cv2.threshold(img,0,255, cv2.THRESH_BINARY+cv2.THRESH_OTSU)[1]
# Get a slice of the image with just the text to improve performance
box = []
data = pytesseract.image_to_data(img, config=r'--oem 3', output_type=pytesseract.Output.DICT)
n_boxes = len(data['level'])
for i in range(n_boxes):
    (x, y, w, h) = (data['left'][i], data['top'][i], data['width'][i], data['height'][i])
    if int(data['conf'][i]) > 0:
        box.append((x, y, x + w, y + h))

h, w = img.shape
for box in box:
    x_min, y_min, x_max, y_max = box
    # Add some padding ensuring we do not get out of the image
    x_min_pad = max(0, x_min - 5)
    y_min_pad = max(0, y_min - 5)
    x_max_pad = min(w, x_max + 5)
    y_max_pad = min(h, y_max + 5)
    # Get the slice
    slice = img[y_min_pad:y_max_pad, x_min_pad:x_max_pad]
# Inference with LSTM on the slice
print(pytesseract.image_to_string(slice, config='--psm 8 --oem 3'))

All the preprocessing done was tested so I know that it does actually improve performance, if I remove it does not even detect the numbers right.

3
  • Hello @Alejandro, have you tried changing your image_to_string configuration ? this post stackoverflow.com/a/60161328/22258429 suggest using different parameters and manage to achieve accurate results. Commented Oct 8, 2024 at 12:59
  • Hi @Lrx, yes, I did try changing parameters in ìmage_to_string', I tried all combinations of psm and oem, from 0 to 8 and 0 to 3 respectively. I saw that oem only worked 1 and 3, the others gave me an error and psm 8 gave the best result. I did also try to do some changes to the threshold and blur functions like they did in that post and i managed to detect the G, but still no luck with Z and X. Commented Oct 9, 2024 at 5:56
  • Hi, I posted an answer, I don't consider it as a final answer, but rather some piece of advice Commented Oct 9, 2024 at 7:13

1 Answer 1

0

The problem appears that the text you want to identify uses a stretched font (and is distorted due to perspective)

I manage to make it (kinda) work by squashing the sliced image down (i.e. keeping every other pixel row) before processing it, with the following parameters:

--psm 8 --oem 3

import pytesseract
import cv2

# Opening the image
img = cv2.imread("plate.jpg")
# Preprocess (Better performance)
w,h,_ = img.shape
img = cv2.resize(img, (w*2,h*2))
img = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
img = cv2.medianBlur(img,5)
img = cv2.threshold(img,0,255, cv2.THRESH_BINARY+cv2.THRESH_OTSU)[1]

# Get a slice of the image with just the text to improve performance
box = []
data = pytesseract.image_to_data(img, config=r'--oem 3', output_type=pytesseract.Output.DICT)
n_boxes = len(data['level'])
for i in range(n_boxes):
    (x, y, w, h) = (data['left'][i], data['top'][i], data['width'][i], data['height'][i])
    if int(data['conf'][i]) > 0:
        box.append((x, y, x + w, y + h))

h, w = img.shape
for box in box:
    x_min, y_min, x_max, y_max = box
    # Add some padding ensuring we do not get out of the image
    x_min_pad = max(0, x_min - 5)
    y_min_pad = max(0, y_min - 5)
    x_max_pad = min(w, x_max + 5)
    y_max_pad = min(h, y_max + 5)
    # Get the slice
    slice = img[y_min_pad:y_max_pad, x_min_pad:x_max_pad]

# Inference with LSTM on the slice
slice = slice[::2, :]
print(pytesseract.image_to_string(slice, config='--psm 8 --oem 3'))

The output is :

|GZX 830]

This is far from perfect considering we can see so residual characters on the side. I believe this will be even cleaner if you correct the perspective (by doing edge detection and standard use of cv2.getPerspectiveTransform and then cv2.warpPerspective)

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.