I am experimenting with AI and specifically character recognition. I saw that one of the best algorithms is OCR and Google's implementation in Tesseract seems like the best open source solution right now.
So, I got some images and tried to apply Tesseract to them using python, the image in this case is a license plate so I wont show the original image, but after some preprocessing I end up with this:
It seems like a pretty simple image to get text from but I always get 624830, so the 830 is fine but 624 should be GZX.
This is my code:
import pytesseract
import cv2
# Opening the image
img = cv2.imread("plate.jpg")
# Preprocess (Better performance)
w,h,_ = img.shape
img = cv2.resize(img, (w*2,h*2))
img = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
img = cv2.medianBlur(img,5)
img = cv2.threshold(img,0,255, cv2.THRESH_BINARY+cv2.THRESH_OTSU)[1]
# Get a slice of the image with just the text to improve performance
box = []
data = pytesseract.image_to_data(img, config=r'--oem 3', output_type=pytesseract.Output.DICT)
n_boxes = len(data['level'])
for i in range(n_boxes):
(x, y, w, h) = (data['left'][i], data['top'][i], data['width'][i], data['height'][i])
if int(data['conf'][i]) > 0:
box.append((x, y, x + w, y + h))
h, w = img.shape
for box in box:
x_min, y_min, x_max, y_max = box
# Add some padding ensuring we do not get out of the image
x_min_pad = max(0, x_min - 5)
y_min_pad = max(0, y_min - 5)
x_max_pad = min(w, x_max + 5)
y_max_pad = min(h, y_max + 5)
# Get the slice
slice = img[y_min_pad:y_max_pad, x_min_pad:x_max_pad]
# Inference with LSTM on the slice
print(pytesseract.image_to_string(slice, config='--psm 8 --oem 3'))
All the preprocessing done was tested so I know that it does actually improve performance, if I remove it does not even detect the numbers right.
image_to_stringconfiguration ? this post stackoverflow.com/a/60161328/22258429 suggest using different parameters and manage to achieve accurate results.