Skip to main content
Filter by
Sorted by
Tagged with
3 votes
2 answers
108 views

In Python, there are two libraries which are often used in tandem, Poppler and Tesseract. They both need external downloads to function: Poppler, Tesseract. The general recommendation for Windows is ...
user30589464's user avatar
-4 votes
1 answer
138 views

I'm doing an ultra-simple web page scraper using Python/Beautifulsoup. Facing a key information displayed as PNG image, I've had to reach for PIL/Pytesseract. Code being extremely simple, and working ...
tishma's user avatar
  • 1,873
0 votes
0 answers
39 views

I need to train the default eng data, so that it can also recognize seom new characters. I created box files and lstm files and when running cmd: lstmtraining \ --model_output output/eng_latin \ --...
coure2011's user avatar
  • 42.8k
0 votes
2 answers
74 views

Pytesseract cannot understand very simple and clear text. I've tried nearest neighbor, bilinear, gaussian blur, and everything else and cannot get tesseract to read the text consistently, the best I ...
RvBVakama's user avatar
  • 117
1 vote
0 answers
184 views

I’m using Docling to OCR scanned PDFs. I want to control Tesseract’s page-segmentation mode (PSM), e.g. --psm 6. Docling exposes both TesseractOcrOptions and TesseractCliOcrOptions, but neither ...
Pamudu Ranasinghe's user avatar
2 votes
1 answer
70 views

I'm attempting to perform OCR on a set of single letters inside an image using Python. I'm new to this so apologies if I get the terminology wrong, but I've filtered and have obtained (I think) quite ...
user201341's user avatar
1 vote
2 answers
246 views

I have installed language support for chi_sim: ls /usr/share/tesseract-ocr/5/tessdata chi_sim.traineddata eng.traineddata pdf.ttf configs osd.traineddata tessconfigs You can try it by ...
showkey's user avatar
  • 375
1 vote
1 answer
83 views

I am currently using tesseract 5.0 and am training a model. I have generated the png, box and the ground truth files for a thousand images. However, when I run the command: make training MODEL_NAME=...
Akshay NN's user avatar
0 votes
1 answer
167 views

I'm trying to get the data out of this image: and no matter what I try I can't get a good result. I have tried ImageEnhance and cv2 I got the most promising result using cv2 and adaptive Treshold: ...
Cyclo's user avatar
  • 3
1 vote
1 answer
81 views

I have a PDF document that I want to scan with pytesseract, but the page numbers are not recognized. The page number is not recognized on any of the pages. The PDF is written with Latex. I ried ...
mike3467's user avatar
0 votes
1 answer
62 views

I'm using pytesseract to read tabular data out of an image but I'm having trouble with the software making "educated guesses" about characters and word splitting based on context. I have a ...
SpliFF's user avatar
  • 39.1k
0 votes
0 answers
62 views

I am trying to fine-tune an Optical Character Recognition (OCR) model on Tesseract's provided tesstrain repository for Japanese . I tried encoding the bash commands into Python in VSCode as I wanted ...
Jiansen Chan's user avatar
0 votes
0 answers
147 views

ExitCodeException _common.py:271 Traceback (most recent call last): File "C:\<USER>\apps\python\...
Username's user avatar
1 vote
1 answer
159 views

I’ve been following this tutorial from YouTube: Guide to Tesseract Training https://www.youtube.com/watch?v=KE4xEzFGSU8&t=13s and its corresponding GitHub repository: astutejoe/tesseract_tutorial. ...
Impetus's user avatar
-1 votes
1 answer
68 views

I'm trying to convert the attached image using the pytesseract and opencv libraries in python, but the conversion is not satisfactory, since many characters are converted incorrectly. Does anyone have ...
Cristi Garcia's user avatar
-1 votes
1 answer
58 views

I am working with a Django application, there for some purpose i need to solve captcha i am already saving temporary captcha file but when i try to read captcha using pytesseract it return nothing ...
Mohit Prajapat's user avatar
2 votes
1 answer
531 views

I've been tring to make a puzzle solving program. The game is 'fruit box' and you can play it through the link below. https://en.gamesaien.com/game/fruit_box/ To do that, I have to extract numbers ...
eunsang's user avatar
  • 23
3 votes
0 answers
107 views

I'm working on a Python script that continuously monitors a screen region, extracts text using Tesseract OCR, and sends serial commands to an Arduino based on the detected text. However, I notice that ...
André's user avatar
  • 31
0 votes
1 answer
40 views

I am trying to use pytesseract to extract numbers from images. It works for some of them (1, 2, 3, 5, 6, 20...) but I would like to make it work for all of them. Here is a sample of the data that I'm ...
User_123917425's user avatar
0 votes
0 answers
72 views

I need to recognize digits on 7 seg clocks(see picture below), so I use following python code: def detect_date(image: cv2.UMat, bbox:list) -> datetime: gry1 = cv2.cvtColor(image, ...
Sharov's user avatar
  • 460
0 votes
0 answers
51 views

I am trying to use Tesseract to create a small Windows application that allows the user to: Take a screenshot of the monitor and cut a smaller portion containing a table (the table always has the ...
Riccardo's user avatar
0 votes
0 answers
24 views

C:\Users\xwmsu>tesseract --list-langs Error opening data file \app\Tesseract-OCR\tessdata/eng.traineddata Please make sure the TESSDATA_PREFIX environment variable is set to the parent directory of ...
阿泥不饿's user avatar
0 votes
0 answers
44 views

I'm new to using Pytesseract, and I'm having trouble recognizing an image: Bet Image import pytesseract pytesseract.pytesseract.tesseract_cmd = 'C:\Program Files (x86)\Tesseract-OCR\Tesseract.exe' # ...
Vanderson Gomes's user avatar
0 votes
0 answers
37 views

I tried to extract the content from an image with the Python py-tesseract OCR, but I was unable to obtain the numbers. I get the extracted_text empty value. Code: def ImageReader(image_path): ...
Yug's user avatar
  • 43
-1 votes
1 answer
125 views

When I use PyTesseract to recognize the text in this image, it returns 'FORREST C. BLopGetTrT' instead of FORREST C. BLODGETT The result of code i get the image i use, which contains many name. I ...
Mengyang Cao's user avatar
1 vote
1 answer
239 views

I am trying to use pytesseract in my system. But I am getting the following error message pytesseract.pytesseract.TesseractError: (1, 'Error opening data file /opt/homebrew/share/eng.traineddata ...
Sashaank's user avatar
  • 972
2 votes
0 answers
111 views

I am using tesseract to perfrom custom model training. I have created my own text dataset and saved in tesstrain->data->codec folder with images and corresponding .gt files. At the same level as ...
Prachi Kedar's user avatar
2 votes
1 answer
195 views

I am experimenting with AI and specifically character recognition. I saw that one of the best algorithms is OCR and Google's implementation in Tesseract seems like the best open source solution right ...
Alejandro's user avatar
0 votes
2 answers
177 views

i am doing a python project, in which i use Tesseract-OCR. when i set it up from git, it gave me this error: C:\Users\jpmv1\AppData\Local\Programs\Python\Python312\python.exe C:\Users\jpmv1\Projects\...
Doutor JP's user avatar
1 vote
1 answer
99 views

I am trying to deploy app through render but after executing there is error as TesseratNotFound or Tesseract is not installed Even though I have added package.txt , requirements.txt as well as build....
Prarthana Kolhe's user avatar
0 votes
0 answers
94 views

I am trying to convert my pdf data into structured table format data. I have tried bunch of options but none of them have been able to separate fields into columns of table format. I am able to do ...
ViSa's user avatar
  • 2,357
0 votes
1 answer
134 views

I'm trying to convert a PDF into a JPEG using python. I'm trying to perform OCR by converting the PDF's into JPEG but keep running into the error: cannot identify image file <_io.BytesIO object at ...
Alvin Joseph's user avatar
2 votes
1 answer
292 views

I am trying to read the text from a U.S. penny to orient the coin. the original is from https://www.usmint.gov/wordpress/wp-content/uploads/2024/05/2024-lincoln-penny-uncirculated-obverse-philadelphia....
skeeter's user avatar
  • 39
-1 votes
1 answer
105 views

I need to use Pytesseract to extract text from this picture: I'm using this code: import pytesseract import cv2 pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract....
Doğucan ÇALIŞKAN's user avatar
0 votes
0 answers
120 views

I am working on a project where I need to extract text from frames of an Instagram Reels video. I used the yt-dlp to download the video, extracted frames using ffmpeg, and attempted to read the text ...
Rasik's user avatar
  • 2,529
0 votes
1 answer
271 views

I'm trying to extract specific data from multiple PDFs. I begin by isolating the example image (Picture 1) using horizontal and vertical lines to create cells. After creating the cells, I crop them ...
David in sweden's user avatar
0 votes
0 answers
47 views

Pytesseract does not extract the text from the image. The terminal stays black with a space as if it was actually trying to extract the text. Here is my code and the image: from PIL import Image ...
Spidercoder's user avatar
0 votes
0 answers
262 views

As an example, I have this image and will like to convert this to an modifiable excel table. In have tried using the 'pytesseract' library, but it doesn't accurately extract the text from the image ...
UsangR01's user avatar
0 votes
1 answer
155 views

This is the original image: This is the processed image: I'm trying to automate a mini-game, in which characters appear on the screen. I did some light reaserch and managed to process the image to ...
Flako's user avatar
  • 1
0 votes
0 answers
63 views

I am working on a python program to solve a wordsearch. I am using pytesseract and opencv to process an image of the wordsearch and the solution will be displayed as a text. The script processes the ...
HND's user avatar
  • 1
0 votes
1 answer
95 views

I am trying to retrieve the text from an image that is a matrix 4x4. The text are numbers. Although I was expecting the numbers all I got was: BE, 8, EEE, BE. The image is attached here: image Anyone ...
Sandro Pinho's user avatar
1 vote
1 answer
148 views

I'm trying to read text on this image using pytesseract library. original-screenshot.png Here is my code: path = 'original-screenshot.png' image = cv2.imread(path) image = cv2.cvtColor(image, cv2....
ThunderFound's user avatar
1 vote
0 answers
50 views

This is the image: This is the sample image that i will convert into text. And here is the output: ***"| | .** indicators (Bids: S.1.4.1. valid Certificate of Registration and **LJ Poy |** ...
Nami_Raven's user avatar
3 votes
1 answer
115 views

I'm trying to calculate the real time of video recording. I have a lot of videos, some of which were lost during transmission. All of them are in mp4 format. to get the duration, I recognize the time ...
Ernán's user avatar
  • 33
-1 votes
1 answer
125 views

I have created a python code to read the captcha using OCR and fill the form further. I have used pytesseract library for the recognition of characters in the captcha. I am unable to retrieve the ...
Onkar Mehra's user avatar
1 vote
0 answers
130 views

def get_string(img_path): img = cv2.imread(img_path) img = cv2.resize(img, None, fx=2, fy=2, interpolation=cv2.INTER_CUBIC) gray_img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) ...
Myat Thet's user avatar
0 votes
1 answer
99 views

currentbid.png: I am trying to detect the number in this image and it gives me letters or the wrong number. This is my image i am trying to detect the number ive tried tons of stuff with greyscale ...
philM's user avatar
  • 11
0 votes
0 answers
236 views

I am trying to use OpenCV and Pytesseract to loop over the white numbers at the bottom of this image (or similar images) and record each number. While I have the logic correct for determining the ROI,...
Axel's user avatar
  • 1
1 vote
0 answers
27 views

I want to make a code to extract the x-axis numbers and x-axis labels in the chart. I hope the numbers and labels are separated. Is there a way to solve it? Recognize the x-axis y-axis and classify it ...
김보미's user avatar
0 votes
2 answers
257 views

Below is a snapshot of our application in test. iOS app in react native. The hierarchy is too deep. We are already using snapshotmaxdepth - 60 as one of the capabilities. Other capabilities include ...
Libra's user avatar
  • 43

1
2 3 4 5
34