Image to text python

Question

I am using python 3.x and using the following code to convert image into text:

from PIL import Image
from pytesseract import image_to_string

image = Image.open('image.png', mode='r')
print(image_to_string(image))

I am getting the following error:

Traceback (most recent call last):
  File "C:/Users/hp/Desktop/GII/Image_to_text.py", line 12, in <module>
    print(image_to_string(image))
  File "C:\Users\hp\Downloads\WinPython-64bit-3.5.1.2\python-3.5.1.amd64\lib\site-packages\pytesseract\pytesseract.py", line 161, in image_to_string
    config=config)
  File "C:\Users\hp\Downloads\WinPython-64bit-3.5.1.2\python-3.5.1.amd64\lib\site-packages\pytesseract\pytesseract.py", line 94, in run_tesseract
    stderr=subprocess.PIPE)
  File "C:\Users\hp\Downloads\WinPython-64bit-3.5.1.2\python-3.5.1.amd64\lib\subprocess.py", line 950, in __init__
    restore_signals, start_new_session)
  File "C:\Users\hp\Downloads\WinPython-64bit-3.5.1.2\python-3.5.1.amd64\lib\subprocess.py", line 1220, in _execute_child
    startupinfo)
FileNotFoundError: [WinError 2] The system cannot find the file specified

Please note that I have put the image in the same directory where my python is present. Also It does not raise error on image = Image.open('image.png', mode='r') but it raises on the line print(image_to_string(image)).

Any idea what might be wrong here? Thanks

This code works for me, when I have both files in the same directory and the image contains some words. Might be something about absolute and relative paths... — Ohumeronen
– Ohumeronen, Commented Jul 21, 2016 at 14:54
You may also try: import os.path; os.path.exists('image.png') — Ohumeronen
– Ohumeronen, Commented Jul 21, 2016 at 14:56
I use this code now: if (os.path.exists('image.png')): image = Image.open('image.png') print(image_to_string(image)) else: print('Does not exist') but get the same error that means file exist but it is raising error when try to read it for text. — muazfaiz
– muazfaiz, Commented Jul 21, 2016 at 14:59

Łukasz Rogalski · Accepted Answer · 2016-07-25 05:09:38Z

8

You have to have tesseract installed and accesible in your path.

According to source, pytesseract is merely a wrapper for subprocess.Popen with tesseract binary as a binary to run. It does not perform any kind of OCR itself.

Relevant part of sources:

def run_tesseract(input_filename, output_filename_base, lang=None, boxes=False, config=None):
    '''
    runs the command:
        `tesseract_cmd` `input_filename` `output_filename_base`

    returns the exit status of tesseract, as well as tesseract's stderr output
    '''
    command = [tesseract_cmd, input_filename, output_filename_base]

    if lang is not None:
        command += ['-l', lang]

    if boxes:
        command += ['batch.nochop', 'makebox']

    if config:
        command += shlex.split(config)

    proc = subprocess.Popen(command,
            stderr=subprocess.PIPE)
    return (proc.wait(), proc.stderr.read())

Quoting another part of source:

# CHANGE THIS IF TESSERACT IS NOT IN YOUR PATH, OR IS NAMED DIFFERENTLY
tesseract_cmd = 'tesseract'

So quick way of changing tesseract path would be:

import pytesseract
pytesseract.tesseract_cmd = "/absolute/path/to/tesseract"  # this should be done only once 
pytesseract.image_to_string(img)

edited Jul 25, 2016 at 5:09

answered Jul 21, 2016 at 16:07

Łukasz Rogalski

23.3k10 gold badges63 silver badges93 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

muazfaiz Over a year ago

I think you are right but I have installed tesseract but it still gives the same error. Infact the brutal part is that when I open the image using image.show() method it dies open the image but in the very next line when I process the image it throws FileNotFoundError. I am completely stuck :(

Łukasz Rogalski Over a year ago

FileNotFoundError is from lack of tesseract, not lack of image file itself. See edit to my answer.

thrinadhn · Accepted Answer · 2019-12-19 09:21:25Z

2

Please install the Below packages for extracting text from images pnf/jpeg

pip install pytesseract

pip install Pillow

using python pytesseract OCR (Optical Character Recognition) is the process of electronically extracting text from images

PIL is used anything from simply reading and writing image files to scientific image processing, geographical information systems, remote sensing, and more.

from PIL import Image
from pytesseract import image_to_string 
print(image_to_string(Image.open('/home/ABCD/Downloads/imageABC.png'),lang='eng'))

answered Dec 19, 2019 at 9:21

thrinadhn

2,67326 silver badges36 bronze badges

Comments

AnkurJangra · Accepted Answer · 2017-09-12 17:19:42Z

1

You need to download tesseract OCR setup as well. Use this link to download the setup:http://digi.bib.uni-mannheim.de/tesseract/tesseract-ocr-setup-3.05.01.exe

Then, include this line in your code to use tesseract executable: pytesseract.pytesseract.tesseract_cmd = 'C:\Program Files (x86)\Tesseract-OCR\tesseract'

This is the default location where tesseract will be installed.

That's it. I have also followed these steps to run the code at my end.

Hope this will help.

answered Sep 12, 2017 at 17:19

AnkurJangra

111 bronze badge

Comments

prabhakar267 · Accepted Answer · 2018-10-18 22:48:50Z

0

You can try using this python library: https://github.com/prabhakar267/ocr-convert-image-to-text

As mentioned on the README of the package, usage is very straightforward.

usage: python main.py [-h] input_dir [output_dir]

positional arguments:
  input_dir
  output_dir

optional arguments:
  -h, --help  show this help message and exit

answered Oct 18, 2018 at 22:48

prabhakar267

1098 bronze badges

Comments

stonebig · Accepted Answer · 2016-07-21 16:04:09Z

-1

Your "current" directory is not where you think.

==> You may specify the full path to the image, for example: image = Image.open(r'C:\Users\hp\Downloads\WinPython-64bit-3.5.1.2\python-3.5.1.amd64\image.png', mode='r')

answered Jul 21, 2016 at 16:04

stonebig

1,1891 gold badge9 silver badges13 bronze badges

Collectives™ on Stack Overflow

Image to text python

5 Answers 5

2 Comments

Comments

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

2 Comments

Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related