6

Trying to run tesseract on python, this is my code:

import cv2
import os
import numpy as np
import matplotlib.pyplot as plt
import pytesseract
import Image
# def main():
jpgCounter = 0
    for root, dirs, files in os.walk('/home/manel/Desktop/fotografias etiquetas'):
    for file in files:
        if file.endswith('.jpg'):
        jpgCounter += 1

for i in range(1, 2):

    name                = str(i) + ".jpg"
    nameBW              = str(i) + "_bw.jpg"
    img                 = cv2.imread(name,0) #zero -> abre em grayscale
    # img                 = cv2.equalizeHist(img)
    kernel = np.array([[0,-1,0], [-1,5,-1], [0,-1,0]])
    img = cv2.filter2D(img, -1, kernel)
    cv2.normalize(img,img,0,255,cv2.NORM_MINMAX)
    med                 = np.median(img)



    retval, threshold_manual    = cv2.threshold(img, med*0.6, 255, cv2.THRESH_BINARY)
    cv2.adaptiveThreshold(img,255,cv2.ADAPTIVE_THRESH_MEAN_C, cv2.THRESH_BINARY,11,2)
    print(pytesseract.image_to_string(threshold_manual, lang='eng', config='-psm 11', nice=0, output_type=Output.STRING))

the error im getting is the following:

NameError: name 'Output' is not defined

Any idea why I'm getting this? thank you!

4
  • 6
    Try writing pytesseract.Output.STRING. Commented Jan 20, 2018 at 14:17
  • 2
    @VasilisG. corrected to this: output_type=pytesseract.Output.STRING got this(different error! ): AttributeError: 'module' object has no attribute 'Output' Commented Jan 20, 2018 at 14:20
  • 1
    According to the documentation of pytesseract, output_type has a default value of Output.STRING, so you can omit that argument, as well as the nice argument in your case. Commented Jan 20, 2018 at 14:27
  • 1
    @VasilisG. thank you for your suggestion. the problem is that im getting a different error when i do so. AttributeError: File "img_proce_clean2.py", line 35, in <module> print(pytesseract.image_to_string(threshold_manual, config='-psm 11')) File "/home/manel/.local/lib/python2.7/site-packages/pytesseract/pytesseract.py", line 104, in image_to_string if len(image.split()) == 4: AttributeError: 'numpy.ndarray' object has no attribute 'split' Commented Jan 20, 2018 at 14:28

2 Answers 2

13

Add.

from pytesseract import Output
Sign up to request clarification or add additional context in comments.

1 Comment

This is a better approach over uninstalling and then installing from source.
5

The problem is you have installed original pytesseract package (downloaded using pip) and referring documentation of madmaze GitHub version, actually both are different.

I suggest uninstalling the present version and cloning the GitHub repo and installing the same, by following this steps:

  1. Uninstall present version:

    pip uninstall pytesseract

  2. Clone madmaze/pytesseract GitHub repo by either using git:

    git clone https://github.com/madmaze/pytesseract.git

    or download it directly by clicking here

  3. Get to the root directory of the cloned repo and run:

    pip install .

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.