How to convert pdf from url to image using pdf2image in python?

Question

I am able to convert pdf file in my drive to images using pdf2image convert_to_path but when I try the same for pdf 'https://example.com/abc.pdf', end up with multiple errors.

Code:

url = 'https://example.com/abc.pdf'
scrape = urlopen(url)  # for external files
pil_images = pdf2image.convert_from_bytes(scrape.read(), dpi=200, 
             output_folder=None, first_page=None, last_page=None,
             thread_count=1, userpw=None,use_cropbox=False, strict=False,
             poppler_path=r"C:\poppler-0.68.0_x86\poppler-0.68.0\bin",)

Error:

   Unable to get page count. Syntax Error: Document stream is empty

Followed below link as well but no luck

Python3: Download PDF to memory and convert first page to image

Screenshot for Authentication:

Avinash Dalvi · Accepted Answer · 2019-10-29 10:43:53Z

1

First download pdf from URL as per mention in this blog. https://dzone.com/articles/simple-examples-of-downloading-files-using-python

Then use this convert pdf to image or any other format in series if you have multiple pages in pdf.

import ghostscript

def pdf2jpeg(pdf_input_path, jpeg_output_path):
    args = ["pdf2jpeg", # actual value doesn't matter
            "-dNOPAUSE",
            "-sDEVICE=jpeg",
            "-r144",
            "-sOutputFile=" + jpeg_output_path,
            pdf_input_path]
    ghostscript.Ghostscript(*args)

Reference : Converting a PDF to a series of images with Python

For authentication try this.

import os
import requests

from urlparse import urlparse

username = 'foo'
password = 'sekret'

url = 'http://example.com/blueberry/download/somefile.jpg'
filename = os.path.basename(urlparse(url).path)

r = requests.get(url, auth=(username,password))

if r.status_code == 200:
   with open(filename, 'wb') as out:
      for bits in r.iter_content():
          out.write(bits)

reference : Download a file providing username and password using Python

edited Oct 29, 2019 at 10:43

answered Oct 29, 2019 at 8:59

Avinash Dalvi

9,4299 gold badges34 silver badges60 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

Manideep Yechuri Over a year ago

How to authenticate the url for wget.download(url, path) method?

Avinash Dalvi Over a year ago

your pdf download url will be having username and password ?

Manideep Yechuri Over a year ago

Yes, Currently it is returning 401 from code and in the browser, we will be able to download only after entering credentials

Avinash Dalvi Over a year ago

which type of authentication you are using can upload screenshot if possible ?

Manideep Yechuri Over a year ago

Working with below code instead. Thanks for suggestions. r = requests.get(url, auth=HttpNtlmAuth('domain\\username',password), stream=True)

|

Collectives™ on Stack Overflow

How to convert pdf from url to image using pdf2image in python?

1 Answer 1

7 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

7 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related