Web scraping python: IndexError: list index out of range

Question

The script reads a single URL from a text file and then imports information from that web page and store it in a CSV file. The script works fine for a single URL. Problem: I have added several URLs in my text file line by line and now I want my script to read first URL, do the desired operation and then go back to text file to read the second URL and repeat. Once I added the for loop to get this done, I stated facing the below error:

Traceback (most recent call last): File "C:\Users\T947610\Desktop\hahah.py", line 22, in table = soup.findAll("table", {"class":"display"})[0] #Facing error in this statement IndexError: list index out of range

f = open("URL.txt", 'r')
for line in f.readlines():
    print (line)
    page = requests.get(line)
    print(page.status_code)
    print(page.content)
    soup = BeautifulSoup(page.text, 'html.parser')
    print("soup command worked")
    table = soup.findAll("table", {"class":"display"})[0] #Facing error in this statement
    rows = table.findAll("tr")

TylerH · Accepted Answer · 2022-02-10 21:18:45Z

1

Sometimes findAll throws an exception if it can't find the data in the findall. I have this same issue and I work around it with try/except, except you'll need to deal with empty values probably differently than I've show, which is for example:

f = open("URL.txt", 'r')
for line in f.readlines():
    print (line)
    page = requests.get(line)
    print(page.status_code)
    print(page.content)
    soup = BeautifulSoup(page.text, 'html.parser')
    print("soup command worked")
    try:
      table = soup.findAll("table", {"class":"display"})[0] #Facing error in this statement
      rows = table.findAll("tr")
    except IndexError:
       table = None
       rows = None

edited Feb 10, 2022 at 21:18

TylerH

21.3k84 gold badges84 silver badges122 bronze badges

answered Nov 25, 2019 at 0:54

oppressionslayer

7,2242 gold badges11 silver badges26 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Sheng Zhuang · Accepted Answer · 2019-11-25 01:15:36Z

0

If the single url input was working, maybe new input line from .txt is the problem. Try apply .strip() to the line, the line normally has whitespace at the head and tail

page = requests.get(line.strip())

Also, if soup.findall() find nothing, it will return None, which cannot be indexed. Try print the soup and check the content.

answered Nov 25, 2019 at 1:15

Sheng Zhuang

6975 silver badges10 bronze badges

Collectives™ on Stack Overflow

Web scraping python: IndexError: list index out of range

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related