1

I am attempting to install PyOCR on my computer running Windows 7. I have installed Tesseract-OCR 3.05 for Windows and have added the directory containing Tesseract (C:\Program Files (x86)\Tesseract-OCR) to the PATH user variable, the Path System Variable, and created the new system variable TESSDATA_PREFIX with the Tesseract directory.

I am able to use Tesseract directly from the command line to process images, so I am confident that Tesseract was correctly installed. I also made sure to installed Tesseract with the C/C++ library files.

I know this question has been posed before, but since I have added the directories to the environmental variables, I am unsure what to try next.

Below is the output of the "get_available_tools()" method.

Python 3.6.3 |Anaconda custom (64-bit)| (default, Oct 15 2017, 03:27:45) 
[MSC v.1900 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import pyocr
>>> import pyocr.builders
>>> pyocr.get_available_tools()
[]
>>>

2 Answers 2

3

get_available_tools() returns a list of ‘OCR tools available on the local system’ (see source code). As I understand, pyocr checks in PATH for available tools, which you have to install yourself.

For now, Pyocr supports the following OCR tools:

  • Libtesseract (Python bindings for the C API)
  • Tesseract (wrapper: fork + exec)
  • Cuneiform (wrapper: fork + exec)

If you have installed such tools, but for some reason you don’t have them in your local path (like me), you can always overwrite it like this:

pyocr.tesseract.TESSERACT_CMD = r'<full_path_to_your_tesseract_executable>'
pyocr.cuneiform.CUNEIFORM_CMD = r'<full_path_to_your_cuneiform_executable>'
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks. TESSERACT_CMD solved the issue. Just curious, what is Libtesseract and how to install it?
Libtesseract is an OCR engine written in C. I don't have any experience with this one, but for PyOCR I guess you could start with Python wrapper like this one github.com/virtuald/python-tesseract-sip.
1

Had a similar problem using libtesseract-4.dll: Tools list was empty.

As i discovered pyocr is using get_version which returned 0.0.0 . As a Workaround i patched get_version on my installation to give back the appropriate value.

This worked because the parameter passing especially the -psm / --psm parameter depends on the result of get_version().

The issue was accepted as bug and seems to be fixed by the author soon. https://gitlab.gnome.org/World/OpenPaperwork/pyocr/issues/106

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.