0

The script reads a single URL from a text file and then imports information from that web page and store it in a CSV file. The script works fine for a single URL. Problem: I have added several URLs in my text file line by line and now I want my script to read first URL, do the desired operation and then go back to text file to read the second URL and repeat. Once I added the for loop to get this done, I stated facing the below error:

Traceback (most recent call last): File "C:\Users\T947610\Desktop\hahah.py", line 22, in table = soup.findAll("table", {"class":"display"})[0] #Facing error in this statement IndexError: list index out of range

f = open("URL.txt", 'r')
for line in f.readlines():
    print (line)
    page = requests.get(line)
    print(page.status_code)
    print(page.content)
    soup = BeautifulSoup(page.text, 'html.parser')
    print("soup command worked")
    table = soup.findAll("table", {"class":"display"})[0] #Facing error in this statement
    rows = table.findAll("tr")

2 Answers 2

1

Sometimes findAll throws an exception if it can't find the data in the findall. I have this same issue and I work around it with try/except, except you'll need to deal with empty values probably differently than I've show, which is for example:

f = open("URL.txt", 'r')
for line in f.readlines():
    print (line)
    page = requests.get(line)
    print(page.status_code)
    print(page.content)
    soup = BeautifulSoup(page.text, 'html.parser')
    print("soup command worked")
    try:
      table = soup.findAll("table", {"class":"display"})[0] #Facing error in this statement
      rows = table.findAll("tr")
    except IndexError:
       table = None
       rows = None
Sign up to request clarification or add additional context in comments.

Comments

0

If the single url input was working, maybe new input line from .txt is the problem. Try apply .strip() to the line, the line normally has whitespace at the head and tail

page = requests.get(line.strip())

Also, if soup.findall() find nothing, it will return None, which cannot be indexed. Try print the soup and check the content.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.