0
ExitCodeException                                                                                         _common.py:271
Traceback (most recent call last):
  File "C:\<USER>\apps\python\current\Lib\site-packages\ocrmypdf\_exec\tesseract.py", line 313, in generate_hocr
    p = run(args_tesseract, stdout=PIPE, stderr=STDOUT, timeout=timeout, check=True)
  File "C:\<USER>\apps\python\current\Lib\site-packages\ocrmypdf\subprocess\__init__.py", line 62, in run
    proc = subprocess_run(args, env=env, check=check, **kwargs)
  File "C:\<USER>\apps\python\current\Lib\subprocess.py", line 579, in run
    raise CalledProcessError(retcode, process.args,
                             output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['C:\\<USER>\\shims\\tesseract.EXE', '-l', 'eng',
'C:\\<USER>\\AppData\\Local\\Temp\\ocrmypdf.io.<RANDOM>\\000045_ocr.png',
'C:\\<USER>\\AppData\\Local\\Temp\\ocrmypdf.io.<RANDOM>\\000045_ocr_hocr', 'hocr', 'txt']' returned
non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\<USER>\apps\python\current\Lib\site-packages\ocrmypdf\_pipelines\_common.py", line 261, in cli_exception_handler
    return fn(options, plugin_manager)
  File "C:\<USER>\apps\python\current\Lib\site-packages\ocrmypdf\_pipelines\ocr.py", line 181, in _run_pipeline
    optimize_messages = exec_concurrent(context, executor)
  File "C:\<USER>\apps\python\current\Lib\site-packages\ocrmypdf\_pipelines\ocr.py", line 117, in exec_concurrent
    executor(
    ~~~~~~~~^
        use_threads=options.use_threads,
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    ...<10 lines>...
        task_finished=update_page,
        ^^^^^^^^^^^^^^^^^^^^^^^^^^
    )
    ^ 
  File "C:\<USER>\apps\python\current\Lib\site-packages\ocrmypdf\_concurrent.py", line 78, in __call__
    self._execute(
    ~~~~~~~~~~~~~^
        use_threads=use_threads,
        ^^^^^^^^^^^^^^^^^^^^^^^^
    ...<5 lines>...
        task_finished=task_finished,
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    )
  File "C:\<USER>\apps\python\current\Lib\site-packages\ocrmypdf\builtin_plugins\concurrency.py", line 144, in _execute
    result = future.result()
  File "C:\<USER>\apps\python\current\Lib\concurrent\futures\_base.py", line 449, in result
    return self.__get_result()
           ~~~~~~~~~~~~~~~~~^^
  File "C:\<USER>\apps\python\current\Lib\concurrent\futures\_base.py", line 401, in __get_result
    raise self._exception
  File "C:\<USER>\apps\python\current\Lib\concurrent\futures\thread.py", line 59, in run
    result = self.fn(*self.args, **self.kwargs)
  File "C:\<USER>\apps\python\current\Lib\site-packages\ocrmypdf\_pipelines\ocr.py", line 81, in _exec_page_sync
    ocr_out, text_out = _image_to_ocr_text(page_context, ocr_image_out)
                        ~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\<USER>\apps\python\current\Lib\site-packages\ocrmypdf\_pipelines\ocr.py", line 62, in _image_to_ocr_text
    hocr_out, text_out = ocr_engine_hocr(ocr_image_out, page_context)
                         ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\<USER>\apps\python\current\Lib\site-packages\ocrmypdf\_pipeline.py", line 678, in ocr_engine_hocr
    ocr_engine.generate_hocr(
    ~~~~~~~~~~~~~~~~~~~~~~~~^
        input_file=input_file,
        ^^^^^^^^^^^^^^^^^^^^^^
    ...<9 lines>...
        user_patterns=options.user_patterns,
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    )
  File "C:\<USER>\apps\python\current\Lib\site-packages\ocrmypdf\builtin_plugins\tesseract_ocr.py", line 268, in generate_hocr
    tesseract.generate_hocr(
    ~~~~~~~~~~~~~~~~~~~~~~~^
        input_file=input_file,
        ^^^^^^^^^^^^^^^^^^^^^^
    ...<9 lines>...
        options=options,
        ^^^^^^^^^^^^^^^^
    )
  File "C:\<USER>\apps\python\current\Lib\site-packages\ocrmypdf\_exec\tesseract.py", line 327, in generate_hocr
    raise SubprocessOutputError() from e
ocrmypdf.exceptions.SubprocessOutputError

This error came as a result of using "ocrmypdf --skip-text '.\input.pdf' output.pdf -v" I get the above error using OCRMYPDF, I installed it with scoop on Windows 11. The PDF was originally a DJVU file, which I converted into a PostScript file and then converted to a PDF.

I used this tutorial to install OCRMYPDF on Windows: https://marko.euptera.com/posts/ocrmypdf-windows.html

This all is a massive headache and haven't found a solution to.

3
  • No, I didn't convert to PNG. I only converted DjVu file into PDF and the only "quick" way was to first convert it into a postscript file or at least it was possible that way. Commented Mar 27 at 8:11
  • I tested tesseract.exe command and it works Commented Mar 27 at 8:22
  • Ok, I will test directly converting the DjVu file to PDF later. I checked the PDF file and the DjVu file and it can see that the font is messed up. I checked the DjVu file and it was clear, so it most likely has something to do with that. I have tesseract version v5.5.0 and used all the commands outlined here: marko.euptera.com/posts/ocrmypdf-windows.html I might do conversion in WSL instead of Windows if the error keeps persisting Commented Mar 28 at 17:28

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.