1

I'm using pytesseract to return the coordinates of the objects in an image.

By using this piece of code:

import pytesseract
from pytesseract import Output
import cv2
img = cv2.imread('wine.jpg')

d = pytesseract.image_to_data(img, output_type=Output.DICT)
    print(d)

for i in range(n_boxes):
    (x, y, w, h) = (d['left'][i], d['top'][i], d['width'][i], d['height'][i])
    cv2.rectangle(img, (x, y), (x + w, y + h), (0, 255, 0), 2)

cv2.imshow('img', img)
cv2.waitKey(0)

I get that:

{'level': [1, 2, 3, 4, 5, 5, 2, 3, 4, 5, 4, 5, 2, 3, 4, 5], 'page_num': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], 'block_num': [0, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3], 'par_num': [0, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1], 'line_num': [0, 0, 0, 1, 1, 1, 0, 0, 1, 1, 2, 2, 0, 0, 1, 1], 'word_num': [0, 0, 0, 0, 1, 2, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1], 'left': [0, 485, 485, 485, 485, 612, 537, 537, 555, 555, 537, 537, 454, 454, 454, 454], 'top': [0, 323, 323, 323, 323, 324, 400, 400, 400, 400, 426, 426, 0, 0, 0, 0], 'width': [1200, 229, 229, 229, 115, 102, 123, 123, 89, 89, 123, 123, 296, 296, 296, 296], 'height': [900, 29, 29, 29, 28, 28, 40, 40, 15, 15, 14, 14, 892, 892, 892, 892], 'conf': ['-1', '-1', '-1', '-1', 58, 96, '-1', '-1', '-1', 95, '-1', 95, '-1', '-1', '-1', 95], 'text': ['', '', '', '', "JACOB'S", 'CREEK', '', '', '', 'SHIRAZ', '', 'CABERNET', '', '', '', '']} 

[image used][enter image description here]1

However, when I use this image: enter image description here

I get that:

{'level': [1, 2, 3, 4, 5], 'page_num': [1, 1, 1, 1, 1], 'block_num': [0, 1, 1, 1, 1], 'par_num': [0, 0, 1, 1, 1], 'line_num': [0, 0, 0, 1, 1], 'word_num': [0, 0, 0, 0, 1], 'left': [0, 0, 0, 0, 0], 'top': [0, 162, 162, 162, 162], 'width': [1200, 0, 0, 0, 0], 'height': [900, 276, 276, 276, 276], 'conf': ['-1', '-1', '-1', '-1', 95], 'text': ['', '', '', '', '']}

Any idea why some image are working and some aren't?

1 Answer 1

1

It is mainly caused by different quality and contrast. it is much easier for the OCR engine to detect texts in desired images. you can add a few pre-processing routines, including thresholding, blurring, histogram equalization and lots of other techniques. it is mainly subjective so I can not provide you with working code, it is more like trial and error to find the best technique for your scope

UPDATE: here is a code that might help you

def preprocessing_typing_detection(inputImage):
    inputImage= cv2.cvtColor(inputImage, cv2.COLOR_BGR2GRAY)
    inputImage= cv2.Laplacian(inputImage, cv2.CV_8U)
    return inputImage

Sign up to request clarification or add additional context in comments.

1 Comment

Could you provide just a code example? For example, if I want to do a thresholding.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.