19

I want to convert a PDF (one page) into a PNG file. I installed pdf2image and got this error:

popler is not installed in windows.

According to this question: Poppler in path for pdf2image, Poppler should be installed and PATH modified.

I cannot do any of those (I don't have the necessary permissions in the system I am working with).

I had a look at OpenCV and PIL and none seems to offer the possibility to make this transformation: PIL (see here https://pillow.readthedocs.io/en/stable/handbook/image-file-formats.html?highlight=pdf#pdf) does not offer the possibility to read PDFs, only to save images as PDFs. The same goes for OpenCV.

Any suggestion on how to make the PDF to PNG transformation? I can install any Python library but I can not touch the Windows installation.

3
  • 1
    I HAVE to do it in python because I can only connect to the APIs from a Jupyter Hub environment, and it has to be done on the fly. Commented Oct 20, 2021 at 15:05
  • 1
    Lucky you, thank the admins for protecting your code from infection by poppler's "viral" copyleft (GPL) license Commented May 20, 2023 at 11:37
  • 1
    PDF is an extremely complex thing to parse, as it's more of a set of instructions for creating an image than it is an image itself. That's why you can't open one in OpenCV or PIL. You'll definitely need to rely on a library specifically for PDF. Commented Apr 9 at 18:58

4 Answers 4

14

PyMuPDF supports pdf to image rasterization without requiring any external dependencies.

Sample code to do a basic pdf to png transformation:

import fitz  # PyMuPDF, imported as fitz for backward compatibility reasons
file_path = "my_file.pdf"
doc = fitz.open(file_path)  # open document
for i, page in enumerate(doc):
    pix = page.get_pixmap()  # render page to an image
    pix.save(f"page_{i}.png")
Sign up to request clarification or add additional context in comments.

6 Comments

Hi @Seon but you are importing a my_file.png, I understand that it could be a pdf right?
That was indeed a typo, fixed it!
doc is indexable, so you can just use a for loop: for i in range(10), and set page=doc[i].
Thanks for your competent comments, @Seon - just an addition: the new PyMuPDF version 1.22.0 also supports saving to JPEG directly, without having to use Pillow: pix.save("file.jpg", jpg_quality=n). As can be seen, the JPEG quality can be chosen with an additional parameter.
Note it is licensed under AGPL, which still requires you to disclose source, like GPL-licensed poppler called by pdf2image (and network use is deemed to be distribution).
|
11

Here is a snippet that generates PNG images of arbitrary resolution (dpi):

# note: pymupdf can be imported as fitz 
# for backward compatibility (use `import pymupdf` in new code)
import fitz 
file_path = "my_file.pdf"
dpi = 300  # choose desired dpi here
zoom = dpi / 72  # zoom factor, standard: 72 dpi
magnify = fitz.Matrix(zoom, zoom)  # magnifies in x, resp. y direction
doc = fitz.open(fname)  # open document
for page in doc:
    pix = page.get_pixmap(matrix=magnify)  # render page to an image
    pix.save(f"page-{page.number}.png")

Generates PNG files name page-0.png, page-1.png, ... By choosing dpi < 72 thumbnail page images would be created.

5 Comments

second row should be fname =, not file_path =
From their rtd (pymupdf.readthedocs.io/en/latest/recipes-images.html): "Since version 1.19.2 there is a more direct way to set the resolution: Parameter "dpi" (dots per inch) can be used in place of "matrix". To create a 300 dpi image of a page specify pix = page.get_pixmap(dpi=300). Apart from notation brevity, this approach has the additional advantage that the dpi value is saved with the image file – which does not happen automatically when using the Matrix notation."
Note the fitz Github repo has been archived by the owner on Aug 3, 2022. It is now read-only. The only version on PyPI is a 5-year-old version tagged "pre-release":)
@mirekphd The fitz module is provided by the pymupdf package. The fitz GitHub repo you are talking about is unrelated.
Indeed, you are right. "Old versions of PyMuPDF had their Python import name as fitz . Newer versions use pymupdf instead, and offer fitz as a fallback so that old code will still work."
4

Try pypdfium2 Python package that comes with PDFium (maintained by Google) binaries. According to its README, it is one of the rare Python libraries that are capable of PDF rendering while not being covered by copyleft licenses (such as the GPL).

It can be used for rendering PDF pages into PIL format or NumPy format, and you have to install those packages separately.

Below is example usage. For the force_bitmap_format parameter you can alternatively provide, for example, pdfium.raw.FPDFBitmap_Gray if you only need a grayscale image or pdfium.raw.FPDFBitmap_BGRA if you also need the alpha channel.

Here are the docs of PdfPage.render().

import pypdfium2 as pdfium

pdf = pdfium.PdfDocument("example.pdf")
page_ix = 0
page = pdf.get_page(page_ix)
bitmap = page.render(
    scale=1,
    rotation=0,
    rev_byteorder=True,  # To get RGB instead of BGR
    force_bitmap_format=pdfium.raw.FPDFBitmap_BGR,
)

page_img_pil = bitmap.to_pil()
page_img_pil.save(f"pdf_page_{page_ix}.png")
print(f"PIL image size: {page_img_pil.size}, PIL image mode: {page_img_pil.mode}")

page_img_np = bitmap.to_numpy()
print(f"NumPy image size: {page_img_np.shape}, NumPy image data type: {page_img_np.dtype}")

Comments

1
import fitz

input_pdf = r"Samples\104295.pdf"

output_jpg = r"Output\104295.jpg"

#The code splits the first page of pdf and converts to jpeg
def split_and_convert(pdf_path, output_path):
    doc = fitz.open(pdf_path)
    page = doc.load_page(0)
    pix = page.get_pixmap()
    pix.save(output_path, "jpeg")
    doc.close()

split_and_convert(input_pdf, output_jpg)

1 Comment

Please add details explaining what your answer does and how it solves the problem, in addition to your code.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.