0

I’m working with a scanned PDF that contains a table with two columns, where each column has two lines of text. When I convert the scanned PDF using OCRmyPDF, I'm encountering an issue with the resulting content.

Tesseract processes the text line by line, but this causes OCRmyPDF to generate separate spans for each piece of content. Specifically, it creates a span for row 1, cell 1, then another span for row 1, cell 2, followed by separate spans for row 2, cell 1, and row 2, cell 2.

This results in accessibility problems for screen readers, as the content is not structured properly. Is there any way to resolve this issue and ensure the table is interpreted correctly by screen readers?

0

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.