0

Python requests giving error: IndexError: list index out of range :

import os
import csv
import requests

write_path = '/Users/specter/Desktop/pdfs/u'  # ASSUMING THAT FOLDER EXISTS!

with open('final.csv', 'r') as csvfile:
    spamreader = csv.reader(csvfile)
    for link in spamreader:
        print('-'*72)
        pdf_file = link[0].split('/')[-1]
    with open(os.path.join(write_path, pdf_file), 'wb') as pdf:
        try:
            # Try to request PDF from URL
            print('TRYING {}...'.format(link[0]))
            a = requests.get(link[0], stream=True)
            for block in a.iter_content(512):
                if not block:
                    break

                pdf.write(block)
            print('OK.')
        except requests.exceptions.RequestException as e:  # This will catch ONLY Requests exceptions
            print('REQUESTS ERROR:')
            print(e)  # This should tell you more details about the error

While trying to download 1000+ pdf's files using request package in python.

Traceback (most recent call last):
  File "update.py", line 11, in <module>
    pdf_file = link[0].split('/')[-1] 
IndexError: list index out of range

Error

4
  • Please copy and paste the error message as text into the question. Posting images makes it unnecessary hard to help you. That said, you try to access the first element of link. It seems, that your link variable is probably empty. Have you checked the contents? Commented May 17, 2017 at 7:08
  • You could format it a little bit nicer (like you did with your code), but this is better. So, have you checked the contents of link? Maybe with a simple print(link) before the line with the link[0].split? Commented May 17, 2017 at 7:17
  • @ChristianKönig tried,but same error. i am new programmer please keep this in mind :) thanks Commented May 17, 2017 at 7:24
  • link is an empty sequence. Commented May 17, 2017 at 7:26

1 Answer 1

1

There are probably some empty lines in your csv file. In that case link will be the empty string '' and you will get an index error. Change the code to:

.
.
.
with open('final.csv', 'r') as csvfile:
    spamreader = csv.reader(csvfile)
    for link in spamreader:
        if not link:
            continue
        print('-'*72)
        pdf_file = link[0].split('/')[-1]
.
.
.

On a further note; your code seems to be strangely indented. As it stands, it will only open the last pdf in final.csv. Are you sure you do not want to indent your second with statement, together with the rest of the code, one more level, to be executed within the for loop?

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.