0

This is my first time web scraping. I've followed a tutorial but I'm trying to scrape a different page and I'm getting the following:

gamesplayed = data[1].getText()

IndexError: list index out of range

This is the code so far

from bs4 import BeautifulSoup
import urllib.request
import csv

urlpage =  'https://www.espn.com/soccer/standings/_/league/FIFA.WORLD/fifa-world-cup'
page = urllib.request.urlopen(urlpage)
soup = BeautifulSoup(page, 'html.parser')
#print(soup)

table = soup.find('table', attrs={'class': 'Table2__table__wrapper'})
results = table.find_all('tr')
#print('Number of results:', len(results))


rows = []
rows.append(['Group A', 'Games Played', 'Wins', 'Draws', 'Losses', 'Goals For', 'Goals Against', 'Goal Difference', 'Points'])
print(rows)

# loop over results
for result in results:
    # find all columns per result
    data = result.find_all('td')
    # check that columns have data
    if len(data) == 0:
        continue

    # write columns to variables
    groupa = data[0].getText()
    gamesplayed = data[1].getText()
    wins = data[2].getText()
    draws = data[3].getText()
    losses = data[4].getText()
    goalsfor = data[5].getText()
    goalsagainst = data[6].getText()
    goaldifference = data[7].getText()
    point = data[8].getText()
1
  • 1
    When you check data in your debugger, what does it say it contains? Commented Jul 25, 2019 at 20:25

2 Answers 2

1

Please have a look at what follows

if len(data) == 0:
        continue

block below

from bs4 import BeautifulSoup
import urllib.request
import csv

urlpage =  'https://www.espn.com/soccer/standings/_/league/FIFA.WORLD/fifa-world-cup'
page = urllib.request.urlopen(urlpage)
soup = BeautifulSoup(page, 'html.parser')
#print(soup)

table = soup.find('table', attrs={'class': 'Table2__table__wrapper'})
results = table.find_all('tr')
#print('Number of results:', len(results))


rows = []
rows.append(['Group A', 'Games Played', 'Wins', 'Draws', 'Losses', 'Goals For', 'Goals Against', 'Goal Difference', 'Points'])
print(rows)

# loop over results
for result in results:
    # find all columns per result
    data = result.find_all('td')
    # check that columns have data
    if len(data) == 0:
        continue
    print(len(data))
    # Here's where you didn't see that what you scraped was list of list
    print(data)
    #[['Group A', 'Games Played', 'Wins', 'Draws', 'Losses', 'Goals For', 'Goals Against', 'Goal Difference', 'Points']]
    data = data[0]
    # write columns to variables
    groupa = data[0].getText()
    gamesplayed = data[1].getText()
    wins = data[2].getText()
    draws = data[3].getText()
    losses = data[4].getText()
    goalsfor = data[5].getText()
    goalsagainst = data[6].getText()
    goaldifference = data[7].getText()
    point = data[8].getText()

Sign up to request clarification or add additional context in comments.

Comments

1

The error message is pretty descriptive: you are trying to access an index in a list that does not exist.

If data has to contain at least 9 elements (you are accessing index 0 through 8) then you should probably change

if len(data) == 0:
    continue

to

if len(data) < 9:
    continue

so you can safely skip data in such a case.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.