0

I get a strange error message when running a OCRmyPDF command

My setup:

  • macOS Sequoia 15.2
  • OCRmyPDF 16.8.0 (installed by Brew)
  • tesseract 5.5.0 (installed by Brew)
  • Command: ocrmypdf -l deu+fra+eng --clean --force-ocr test.pdf test-out.pdf 2>> debugOCR.txt

I have to say that the command is triggered by the software NoodleSoft Hazel, and as far as i understand Hazel executes the shell commands in a dedicated environment. However, my setup worked fine for a few weeks, but within the processing of a batch of PDF files, the following error started to occur. Since then I was not able to bring it back to work.

The debug file debugOCR.txt shows the following error:

1 [tesseract] Error in fopenReadStream: failed to open locally with tail 000001_ocr.png for filename /tmp/ocrmypdf.io.81a_o2mw/000001_ocr.png
1 [tesseract] Leptonica Error in findFileFormat: image file not found: /tmp/ocrmypdf.io.81a_o2mw/000001_ocr.png
1 [tesseract] Error in fopenReadStream: failed to open locally with tail PNG for filename PNG
1 [tesseract] Leptonica Error in pixRead: image file not found: PNG
1 [tesseract] Image file PNG cannot be read!
1 [tesseract] Error during processing.
SubprocessOutputError

In the folder /tmp i can't find any subfolder like /tmp/ocrmypdf.io.81a_o2mw/.

I also have to mention that when executing the following commands directly in Apple Terminal, they work fine:

ocrmypdf -l deu+fra+eng --clean --force-ocr test.pdf test-out.pdf 2>> debugOCR.txt
tesseract test.tiff output --oem 1 -l eng pdf 

Any hints where I have to dig deeper? Is ocrmypdf or tesseract missing some environment variables in the Hazel environment? Other hints?

Thanks a lot

AJ

1
  • the two commands are just to show that ocrmypd and tesseract work when used separately directly in the Apple Terminal. The error message I get when I call ocrmypdf from Hazel. In Hazel i don't call tesseract directly, tesseract is being called by ocrmypdf as a sub-process, as far as I understand. Commented Jan 30 at 7:18

1 Answer 1

0

https://github.com/tesseract-ocr/tesseract/issues/4333

This is likely the issue.

I faced the same while using wcgw mcp which also has a separate terminal evironment.

Setting TMPDIR to //tmp helped me.

Sign up to request clarification or add additional context in comments.

1 Comment

I already had ... export TMPDIR="/tmp" ... in my script. but changing it to ... export TMPDIR="//tmp" ... did the trick... thanks a lot for the hint...

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.