Pytesseract not recognize text from image in Python

Question

I am working with a Django application, there for some purpose i need to solve captcha i am already saving temporary captcha file but when i try to read captcha using pytesseract it return nothing empty string.

Already installed tesseract and tesseract-OCR.
Already Tries many times assuming that sometimes it don't work.

@SubirChowdhury that is the code. For OP, just to confirm, have you followed all of the steps in the installation guide? github.com/madmaze/pytesseract?tab=readme-ov-file#installation — Max
– Max, Commented Mar 13 at 6:42
tesseract sometimes may have problem when text is too small or too big. It may have other problems. See documentation: Improving the quality of the output | tessdoc — furas
– furas, Commented Mar 13 at 12:01
real problem is not selenium but only tesseract - so I removed tags for selenium — furas
– furas, Commented Mar 13 at 13:38

furas · Accepted Answer · 2025-03-13 13:38:39Z

tesseract sometimes may have problem when text is too small or too big. It may have other problems. See documentation: Improving the quality of the output | tessdoc

If I resize your image 200% then tesseract can get text.

I used external program ImageMagick for this but you may use python module pillow
(or Wand which also uses Imagemagick)

$ convert captcha.png -scale 200% captcha-200p.png

Command file can show some information about files

$ file ca*

captcha-200p.png: PNG image data, 300 x 60, 8-bit grayscale, non-interlaced
captcha.png:      PNG image data, 150 x 30, 8-bit/color RGBA, non-interlaced

Strange is that you don't get any error message because when I run tesseract only with input image then it shows message how to use it

$ tesseract captcha-200p.png

Usage:
  tesseract --help | --help-extra | --version
  tesseract --list-langs
  tesseract imagename outputbase [options...] [configfile...]

OCR options:
  -l LANG[+LANG]        Specify language(s) used for OCR.
NOTE: These options must occur before any configfile.

Single options:
  --help                Show this help message.
  --help-extra          Show extra help for advanced users.
  --version             Show version information.
  --list-langs          List available languages for tesseract engine.

It needs output name without extension (and it adds .txt) to write result in file

$ tesseract captcha-200p.png output

Estimating resolution as 308

$ cat ouput.txt

81+20=?

or it needs - to set ouput to stdout and show it on screen or redirect to other program

$ tesseract captcha-200p.png -

Estimating resolution as 308
81+20=?

Tested on: Linux Mint 22 (based on Ubuntu 24.02), tesseract 5.3.4 (leptonica-1.82.0)

Collectives™ on Stack Overflow

Pytesseract not recognize text from image in Python

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related