0

I am trying to scrape data from https://www.transfermarkt.co.uk/premier-league/startseite/wettbewerb/GB1

I have used this code to do so:

headers = {'User-Agent': 
           'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.106 Safari/537.36'}

page = 'https://www.transfermarkt.co.uk/premier-league/startseite/wettbewerb/GB1'
pageTree = requests.get(page, headers=headers)
pageTree_text = pageTree.text

pageSoup = BeautifulSoup(pageTree_text, 'html.parser')

After, I want to find all the links that is connected to each team name, and use this code:

linkLocation = pageSoup.find_all("a", {"class": "vereinprofil_tooltip tooltipstered"})
linkLocation[0].text

output:


IndexError Traceback (most recent call last) in 1 linkLocation = pageSoup.find_all("a", {"class": "vereinprofil_tooltip tooltipstered"}) ----> 2 linkLocation[0].text

IndexError: list index out of range

Why doesn`t the list have any of the links within it?

Thnx in advcance!

1 Answer 1

0

"tooltipstered" class is added by javascript and is not available in the plain html document returned by the server. You can see that when you open the "source" of the page not using browser inspector.

As you can see "tooltipster" is some jquery plugin, you will need to use some other tool to scrape this page (eg.: selenium).

<script type="text/javascript" src="https://tmssl.akamaized.net//assets/e17e6900/js/jquery.tooltipster.js?lm=1574952016"></script>
Sign up to request clarification or add additional context in comments.

2 Comments

Hi So I cannot scrape the data from this page, using Python and BeautifulSoup?
You can using python, but not with BeautifulSoup alone. You can try with this SO answer: stackoverflow.com/questions/49939123/…

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.