I want to extract all pages from this PDF file, improve their color levels, and eventually OCR them.
I've used Imagemagick:
magick Historia_de_CA_vol1_Cap1_0.pdf mogrify -auto-level Historia_de_CA_vol1_Cap1_0-*.jpg,
which remarkably improves the quality of embedded images, as can be seen in the document's 1st and 21st pages. I suspect this is because Imagemagick properly interprets a transparency layer that is converted to a black or dark background by Adobe Acrobat Reader. Unfortunately, the extracted text is blurrier than in the original
I've also used poppler's PDFtoppm utility:
pdftoppm -jpeg Historia_de_CA_vol1_Cap1_0.pdf Historia_de_CA_vol1_Cap1_0,
which produces crisp text, suitable for OCR, but retains the poor quality of the embedded images seen on pages 1 and 21 of the original PDF, where transparency seems to be rendered as a dark layer.
How can I get Imagemagick to produce improved images and crisp text suitable for OCR, or conversely, how can I get PDFtoppm to properly render the suspected transparent layer in the original PDF?