11

I am new to tesseract OCR. I tried to convert an image to tif and run it to see what the output from tesseract using cmd in windows, but I couldn't. Can you help me? What will be command to use?

Here is my sample image:

enter image description here

7
  • Please explain what you have tried in more detail. Commented Oct 9, 2014 at 10:29
  • @Vish I installed tesseract library from its site. and from cmd i tried to convert the text image. tesseract imagename.tif output. But couldn't get any output. Commented Oct 9, 2014 at 23:57
  • For the syntax you typed, the output is stored in a file, output.txt. Did you check that such a file was created? Also, can you upload your TIF File somewhere? If I get some time I can check with my tesseract install. Commented Oct 10, 2014 at 5:44
  • @Vish Now I have added the tif image Commented Oct 17, 2014 at 0:09
  • @Vish thanks a lot i found the solution. can you comment your email i need to get advisers from you Commented Oct 17, 2014 at 16:09

1 Answer 1

19

The simplest tesseract.exe syntax is tesseract.exe inputimage output-text-file. The assumption here, is that tesseract.exe is added to the PATH environment variable. You can add the -psm N argument if your text argument is particularly hard to recognize.

I see that the regular syntax (without any -psm switches) works fine enough with the image you attached, unless the level of accuracy is not good enough.

Note that non-english characters (such as the symbol next to prescription) are not recognized; my default installation only contains the English training data.

Here's the tesseract syntax description:

C:\Users\vish\Desktop>tesseract.exe
Usage:tesseract.exe imagename outputbase [-l lang] [-psm pagesegmode] [configfile...]

pagesegmode values are:
0 = Orientation and script detection (OSD) only.
1 = Automatic page segmentation with OSD.
2 = Automatic page segmentation, but no OSD, or OCR
3 = Fully automatic page segmentation, but no OSD. (Default)
4 = Assume a single column of text of variable sizes.
5 = Assume a single uniform block of vertically aligned text.
6 = Assume a single uniform block of text.
7 = Treat the image as a single text line.
8 = Treat the image as a single word.
9 = Treat the image as a single word in a circle.
10 = Treat the image as a single character.
-l lang and/or -psm pagesegmode must occur before anyconfigfile.

Single options:
  -v --version: version info
  --list-langs: list available languages for tesseract engine

And here's the output for your image (NOTE: When I downloaded it, it converted to a PNG image):

C:\Users\vish\Desktop>tesseract.exe ECL8R.png out.txt
Tesseract Open Source OCR Engine v3.02 with Leptonica

C:\Users\vish\Desktop>type out.txt.txt
1 Project Background

A prescription (R) is a written order by a physician or medical doctor to a pharmacist in the form of
medication instructions for an individual patient. You can't get prescription medicines unless someone
with authority prescribes them. Usually, this means a written prescription from your doctor. Dentists,

optometrists, midwives and nurse practitioners may also be authorized to prescribe medicines for you.

It can also be defined as an order to take certain medications.

A prescription has legal implications; this means the prescriber must assume his responsibility for the
clinical care ofthe patient.

Recently, the term "prescriptionΓÇ¥ has known a wider usage being used for clinical assessments,
Sign up to request clarification or add additional context in comments.

1 Comment

Can you give me example of how to use configfile?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.