IndexError: list index out of range (Python web scraping)

Question

This is my first time web scraping. I've followed a tutorial but I'm trying to scrape a different page and I'm getting the following:

gamesplayed = data[1].getText()

IndexError: list index out of range

This is the code so far

from bs4 import BeautifulSoup
import urllib.request
import csv

urlpage =  'https://www.espn.com/soccer/standings/_/league/FIFA.WORLD/fifa-world-cup'
page = urllib.request.urlopen(urlpage)
soup = BeautifulSoup(page, 'html.parser')
#print(soup)

table = soup.find('table', attrs={'class': 'Table2__table__wrapper'})
results = table.find_all('tr')
#print('Number of results:', len(results))


rows = []
rows.append(['Group A', 'Games Played', 'Wins', 'Draws', 'Losses', 'Goals For', 'Goals Against', 'Goal Difference', 'Points'])
print(rows)

# loop over results
for result in results:
    # find all columns per result
    data = result.find_all('td')
    # check that columns have data
    if len(data) == 0:
        continue

    # write columns to variables
    groupa = data[0].getText()
    gamesplayed = data[1].getText()
    wins = data[2].getText()
    draws = data[3].getText()
    losses = data[4].getText()
    goalsfor = data[5].getText()
    goalsagainst = data[6].getText()
    goaldifference = data[7].getText()
    point = data[8].getText()

When you check data in your debugger, what does it say it contains? — Michael Kolber
– Michael Kolber, Commented Jul 25, 2019 at 20:25

LazyCoder · Accepted Answer · 2019-07-25 20:29:39Z

Please have a look at what follows

if len(data) == 0:
        continue

block below

from bs4 import BeautifulSoup
import urllib.request
import csv

urlpage =  'https://www.espn.com/soccer/standings/_/league/FIFA.WORLD/fifa-world-cup'
page = urllib.request.urlopen(urlpage)
soup = BeautifulSoup(page, 'html.parser')
#print(soup)

table = soup.find('table', attrs={'class': 'Table2__table__wrapper'})
results = table.find_all('tr')
#print('Number of results:', len(results))


rows = []
rows.append(['Group A', 'Games Played', 'Wins', 'Draws', 'Losses', 'Goals For', 'Goals Against', 'Goal Difference', 'Points'])
print(rows)

# loop over results
for result in results:
    # find all columns per result
    data = result.find_all('td')
    # check that columns have data
    if len(data) == 0:
        continue
    print(len(data))
    # Here's where you didn't see that what you scraped was list of list
    print(data)
    #[['Group A', 'Games Played', 'Wins', 'Draws', 'Losses', 'Goals For', 'Goals Against', 'Goal Difference', 'Points']]
    data = data[0]
    # write columns to variables
    groupa = data[0].getText()
    gamesplayed = data[1].getText()
    wins = data[2].getText()
    draws = data[3].getText()
    losses = data[4].getText()
    goalsfor = data[5].getText()
    goalsagainst = data[6].getText()
    goaldifference = data[7].getText()
    point = data[8].getText()

Mike Scotty · Accepted Answer · 2019-07-25 20:26:19Z

1

The error message is pretty descriptive: you are trying to access an index in a list that does not exist.

If data has to contain at least 9 elements (you are accessing index 0 through 8) then you should probably change

if len(data) == 0:
    continue

to

if len(data) < 9:
    continue

so you can safely skip data in such a case.

answered Jul 25, 2019 at 20:26

Mike Scotty

10.8k5 gold badges42 silver badges52 bronze badges

Collectives™ on Stack Overflow

IndexError: list index out of range (Python web scraping)

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related