8

I am using python 3.x and using the following code to convert image into text:

from PIL import Image
from pytesseract import image_to_string

image = Image.open('image.png', mode='r')
print(image_to_string(image))

I am getting the following error:

Traceback (most recent call last):
  File "C:/Users/hp/Desktop/GII/Image_to_text.py", line 12, in <module>
    print(image_to_string(image))
  File "C:\Users\hp\Downloads\WinPython-64bit-3.5.1.2\python-3.5.1.amd64\lib\site-packages\pytesseract\pytesseract.py", line 161, in image_to_string
    config=config)
  File "C:\Users\hp\Downloads\WinPython-64bit-3.5.1.2\python-3.5.1.amd64\lib\site-packages\pytesseract\pytesseract.py", line 94, in run_tesseract
    stderr=subprocess.PIPE)
  File "C:\Users\hp\Downloads\WinPython-64bit-3.5.1.2\python-3.5.1.amd64\lib\subprocess.py", line 950, in __init__
    restore_signals, start_new_session)
  File "C:\Users\hp\Downloads\WinPython-64bit-3.5.1.2\python-3.5.1.amd64\lib\subprocess.py", line 1220, in _execute_child
    startupinfo)
FileNotFoundError: [WinError 2] The system cannot find the file specified

Please note that I have put the image in the same directory where my python is present. Also It does not raise error on image = Image.open('image.png', mode='r') but it raises on the line print(image_to_string(image)).

Any idea what might be wrong here? Thanks

3
  • This code works for me, when I have both files in the same directory and the image contains some words. Might be something about absolute and relative paths... Commented Jul 21, 2016 at 14:54
  • You may also try: import os.path; os.path.exists('image.png') Commented Jul 21, 2016 at 14:56
  • 1
    I use this code now: if (os.path.exists('image.png')): image = Image.open('image.png') print(image_to_string(image)) else: print('Does not exist') but get the same error that means file exist but it is raising error when try to read it for text. Commented Jul 21, 2016 at 14:59

5 Answers 5

8

You have to have tesseract installed and accesible in your path.

According to source, pytesseract is merely a wrapper for subprocess.Popen with tesseract binary as a binary to run. It does not perform any kind of OCR itself.

Relevant part of sources:

def run_tesseract(input_filename, output_filename_base, lang=None, boxes=False, config=None):
    '''
    runs the command:
        `tesseract_cmd` `input_filename` `output_filename_base`

    returns the exit status of tesseract, as well as tesseract's stderr output
    '''
    command = [tesseract_cmd, input_filename, output_filename_base]

    if lang is not None:
        command += ['-l', lang]

    if boxes:
        command += ['batch.nochop', 'makebox']

    if config:
        command += shlex.split(config)

    proc = subprocess.Popen(command,
            stderr=subprocess.PIPE)
    return (proc.wait(), proc.stderr.read())

Quoting another part of source:

# CHANGE THIS IF TESSERACT IS NOT IN YOUR PATH, OR IS NAMED DIFFERENTLY
tesseract_cmd = 'tesseract'

So quick way of changing tesseract path would be:

import pytesseract
pytesseract.tesseract_cmd = "/absolute/path/to/tesseract"  # this should be done only once 
pytesseract.image_to_string(img)
Sign up to request clarification or add additional context in comments.

2 Comments

I think you are right but I have installed tesseract but it still gives the same error. Infact the brutal part is that when I open the image using image.show() method it dies open the image but in the very next line when I process the image it throws FileNotFoundError. I am completely stuck :(
FileNotFoundError is from lack of tesseract, not lack of image file itself. See edit to my answer.
2

Please install the Below packages for extracting text from images pnf/jpeg

pip install pytesseract

pip install Pillow 

using python pytesseract OCR (Optical Character Recognition) is the process of electronically extracting text from images

PIL is used anything from simply reading and writing image files to scientific image processing, geographical information systems, remote sensing, and more.

from PIL import Image
from pytesseract import image_to_string 
print(image_to_string(Image.open('/home/ABCD/Downloads/imageABC.png'),lang='eng'))

Comments

1

You need to download tesseract OCR setup as well. Use this link to download the setup:http://digi.bib.uni-mannheim.de/tesseract/tesseract-ocr-setup-3.05.01.exe

Then, include this line in your code to use tesseract executable: pytesseract.pytesseract.tesseract_cmd = 'C:\Program Files (x86)\Tesseract-OCR\tesseract'

This is the default location where tesseract will be installed.

That's it. I have also followed these steps to run the code at my end.

Hope this will help.

Comments

0

You can try using this python library: https://github.com/prabhakar267/ocr-convert-image-to-text

As mentioned on the README of the package, usage is very straightforward.

usage: python main.py [-h] input_dir [output_dir]

positional arguments:
  input_dir
  output_dir

optional arguments:
  -h, --help  show this help message and exit

Comments

-1

Your "current" directory is not where you think.

==> You may specify the full path to the image, for example: image = Image.open(r'C:\Users\hp\Downloads\WinPython-64bit-3.5.1.2\python-3.5.1.amd64\image.png', mode='r')

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.