7

Using this tool http://trainyourtesseract.com/ I would like to be able to use new fonts with pytesseract. the tool give me a file called *.traineddata

Right now I'm using this simple script :

try:
    import Image
except ImportError:
    from PIL import Image
import pytesseract as tes

results = tes.image_to_string(Image.open('./test.jpg'),boxes=True)
file = open('parsing.text','a')
file.write(results)
print(results)

How to I use my traineddata file so I'm able to read new font with the python script ?

thanks !

edit#1 : so I understand that *.traineddata can be used with Tesseract as a command-line program. so my question still the same, how do I use traineddata with python ?

edit#2 : the answer to my question is here How to access the command line for Tesseract from Python?

1 Answer 1

7

Below is a sample of pytesseract.image_to_string() with options.

pytesseract.image_to_string(Image.open("./imagesStackoverflow/xyz-small-gray.png"),
                                  lang="eng",boxes=False,
                                  config="--psm 4 --oem 3 
                                  -c tessedit_char_whitelist=-01234567890XYZ:"))

To use your own trained language data, just replace "eng" in lang="eng" with you language name(.traineddata).

Sign up to request clarification or add additional context in comments.

3 Comments

A small addition to the above answer: Keep xyz.traineddata file in the path where tesseract data is kept (example: /usr/share/tesseract-ocr/tessdata/) and pass following : pytesseract.image_to_string(Image.open("./imagesStackoverflow/xyz-small-gray.png"),lang="xyz")
.traineddata is appended to the lang name and whitelist is broken in OpenCV 4.
So if my file is in a.traineddata and my language is "eng", by your instructions, would I do: lang="eng(a.traineddata)"?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.