Converting PDF to PNG with Python (without pdf2image)

Question

I want to convert a PDF (one page) into a PNG file. I installed pdf2image and got this error:

popler is not installed in windows.

According to this question: Poppler in path for pdf2image, Poppler should be installed and PATH modified.

I cannot do any of those (I don't have the necessary permissions in the system I am working with).

I had a look at OpenCV and PIL and none seems to offer the possibility to make this transformation: PIL (see here https://pillow.readthedocs.io/en/stable/handbook/image-file-formats.html?highlight=pdf#pdf) does not offer the possibility to read PDFs, only to save images as PDFs. The same goes for OpenCV.

Any suggestion on how to make the PDF to PNG transformation? I can install any Python library but I can not touch the Windows installation.

I HAVE to do it in python because I can only connect to the APIs from a Jupyter Hub environment, and it has to be done on the fly. — JFerro
– JFerro, Commented Oct 20, 2021 at 15:05
Lucky you, thank the admins for protecting your code from infection by poppler's "viral" copyleft (GPL) license — mirekphd
– mirekphd, Commented May 20, 2023 at 11:37
PDF is an extremely complex thing to parse, as it's more of a set of instructions for creating an image than it is an image itself. That's why you can't open one in OpenCV or PIL. You'll definitely need to rely on a library specifically for PDF. — Mark Ransom
– Mark Ransom, Commented Apr 9 at 18:58

Andrey · Accepted Answer · 2023-05-31 07:26:52Z

14

PyMuPDF supports pdf to image rasterization without requiring any external dependencies.

Sample code to do a basic pdf to png transformation:

import fitz  # PyMuPDF, imported as fitz for backward compatibility reasons
file_path = "my_file.pdf"
doc = fitz.open(file_path)  # open document
for i, page in enumerate(doc):
    pix = page.get_pixmap()  # render page to an image
    pix.save(f"page_{i}.png")

edited May 31, 2023 at 7:26

Andrey

931 silver badge6 bronze badges

answered Oct 20, 2021 at 10:23

Seon

4,03512 silver badges31 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

JFerro Over a year ago

Hi @Seon but you are importing a my_file.png, I understand that it could be a pdf right?

Seon Over a year ago

That was indeed a typo, fixed it!

Seon Over a year ago

doc is indexable, so you can just use a for loop: for i in range(10), and set page=doc[i].

Jorj McKie Over a year ago

Thanks for your competent comments, @Seon - just an addition: the new PyMuPDF version 1.22.0 also supports saving to JPEG directly, without having to use Pillow: pix.save("file.jpg", jpg_quality=n). As can be seen, the JPEG quality can be chosen with an additional parameter.

mirekphd Over a year ago

Note it is licensed under AGPL, which still requires you to disclose source, like GPL-licensed poppler called by pdf2image (and network use is deemed to be distribution).

|

mirekphd · Accepted Answer · 2025-04-27 07:53:47Z

11

Here is a snippet that generates PNG images of arbitrary resolution (dpi):

# note: pymupdf can be imported as fitz 
# for backward compatibility (use `import pymupdf` in new code)
import fitz 
file_path = "my_file.pdf"
dpi = 300  # choose desired dpi here
zoom = dpi / 72  # zoom factor, standard: 72 dpi
magnify = fitz.Matrix(zoom, zoom)  # magnifies in x, resp. y direction
doc = fitz.open(fname)  # open document
for page in doc:
    pix = page.get_pixmap(matrix=magnify)  # render page to an image
    pix.save(f"page-{page.number}.png")

Generates PNG files name page-0.png, page-1.png, ... By choosing dpi < 72 thumbnail page images would be created.

edited Apr 27 at 7:53

mirekphd

7,2314 gold badges62 silver badges89 bronze badges

answered Oct 20, 2021 at 22:18

Jorj McKie

3,2831 gold badge17 silver badges24 bronze badges

5 Comments

Chadee Fouad Over a year ago

second row should be fname =, not file_path =

Joschua Over a year ago

From their rtd (pymupdf.readthedocs.io/en/latest/recipes-images.html): "Since version 1.19.2 there is a more direct way to set the resolution: Parameter "dpi" (dots per inch) can be used in place of "matrix". To create a 300 dpi image of a page specify pix = page.get_pixmap(dpi=300). Apart from notation brevity, this approach has the additional advantage that the dpi value is saved with the image file – which does not happen automatically when using the Matrix notation."

mirekphd Over a year ago

Note the fitz Github repo has been archived by the owner on Aug 3, 2022. It is now read-only. The only version on PyPI is a 5-year-old version tagged "pre-release":)

mara004 Apr 26 at 23:41

@mirekphd The fitz module is provided by the pymupdf package. The fitz GitHub repo you are talking about is unrelated.

mirekphd Apr 27 at 7:01

Indeed, you are right. "Old versions of PyMuPDF had their Python import name as fitz . Newer versions use pymupdf instead, and offer fitz as a fallback so that old code will still work."

tuomastik · Accepted Answer · 2025-04-09 19:29:10Z

Try pypdfium2 Python package that comes with PDFium (maintained by Google) binaries. According to its README, it is one of the rare Python libraries that are capable of PDF rendering while not being covered by copyleft licenses (such as the GPL).

It can be used for rendering PDF pages into PIL format or NumPy format, and you have to install those packages separately.

Below is example usage. For the force_bitmap_format parameter you can alternatively provide, for example, pdfium.raw.FPDFBitmap_Gray if you only need a grayscale image or pdfium.raw.FPDFBitmap_BGRA if you also need the alpha channel.

Here are the docs of PdfPage.render().

import pypdfium2 as pdfium

pdf = pdfium.PdfDocument("example.pdf")
page_ix = 0
page = pdf.get_page(page_ix)
bitmap = page.render(
    scale=1,
    rotation=0,
    rev_byteorder=True,  # To get RGB instead of BGR
    force_bitmap_format=pdfium.raw.FPDFBitmap_BGR,
)

page_img_pil = bitmap.to_pil()
page_img_pil.save(f"pdf_page_{page_ix}.png")
print(f"PIL image size: {page_img_pil.size}, PIL image mode: {page_img_pil.mode}")

page_img_np = bitmap.to_numpy()
print(f"NumPy image size: {page_img_np.shape}, NumPy image data type: {page_img_np.dtype}")

Yuvraj Sharma · Accepted Answer · 2023-06-21 07:56:59Z

1

import fitz

input_pdf = r"Samples\104295.pdf"

output_jpg = r"Output\104295.jpg"

#The code splits the first page of pdf and converts to jpeg
def split_and_convert(pdf_path, output_path):
    doc = fitz.open(pdf_path)
    page = doc.load_page(0)
    pix = page.get_pixmap()
    pix.save(output_path, "jpeg")
    doc.close()

split_and_convert(input_pdf, output_jpg)

answered Jun 21, 2023 at 7:56

Yuvraj Sharma

111 bronze badge

1 Comment

coradek Over a year ago

Please add details explaining what your answer does and how it solves the problem, in addition to your code.

Collectives™ on Stack Overflow

Converting PDF to PNG with Python (without pdf2image)

4 Answers 4

6 Comments

5 Comments

Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

6 Comments

5 Comments

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related