2

I am able to convert pdf file in my drive to images using pdf2image convert_to_path but when I try the same for pdf 'https://example.com/abc.pdf', end up with multiple errors.

Code:

url = 'https://example.com/abc.pdf'
scrape = urlopen(url)  # for external files
pil_images = pdf2image.convert_from_bytes(scrape.read(), dpi=200, 
             output_folder=None, first_page=None, last_page=None,
             thread_count=1, userpw=None,use_cropbox=False, strict=False,
             poppler_path=r"C:\poppler-0.68.0_x86\poppler-0.68.0\bin",)

Error:

   Unable to get page count. Syntax Error: Document stream is empty

Followed below link as well but no luck

Python3: Download PDF to memory and convert first page to image

Screenshot for Authentication:

enter image description here

1 Answer 1

1

First download pdf from URL as per mention in this blog. https://dzone.com/articles/simple-examples-of-downloading-files-using-python

Then use this convert pdf to image or any other format in series if you have multiple pages in pdf.

import ghostscript

def pdf2jpeg(pdf_input_path, jpeg_output_path):
    args = ["pdf2jpeg", # actual value doesn't matter
            "-dNOPAUSE",
            "-sDEVICE=jpeg",
            "-r144",
            "-sOutputFile=" + jpeg_output_path,
            pdf_input_path]
    ghostscript.Ghostscript(*args)

Reference : Converting a PDF to a series of images with Python

For authentication try this.

import os
import requests

from urlparse import urlparse

username = 'foo'
password = 'sekret'

url = 'http://example.com/blueberry/download/somefile.jpg'
filename = os.path.basename(urlparse(url).path)

r = requests.get(url, auth=(username,password))

if r.status_code == 200:
   with open(filename, 'wb') as out:
      for bits in r.iter_content():
          out.write(bits)

reference : Download a file providing username and password using Python

Sign up to request clarification or add additional context in comments.

7 Comments

How to authenticate the url for wget.download(url, path) method?
your pdf download url will be having username and password ?
Yes, Currently it is returning 401 from code and in the browser, we will be able to download only after entering credentials
which type of authentication you are using can upload screenshot if possible ?
Working with below code instead. Thanks for suggestions. r = requests.get(url, auth=HttpNtlmAuth('domain\\username',password), stream=True)
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.