0

I am trying to extract the links using a specific work in each link in the list of links. Below is the code that I get the URLs:

import urllib

from bs4 import BeautifulSoup as bs    
url ='https://fbref.com/en/squads/b8fd03ef/Manchester-City-Stats'
html_page = urllib.request.urlopen(url)
soup = bs(html_page, "html.parser")
links = []
player_link =[]
for link in soup.findAll('a'):
    links.append(link.get('href'))

From the above lines of code, I can store the list of links in the variable links I want to create a new list containing only the specific word summary. The expected output ( only part of all) that should be stored in a new list player_list is shown below:

 player_list =['/en/players/3bb7b8b4/matchlogs/2021-2022/summary/Ederson-Match-Logs',
    '/en/players/3eb22ec9/matchlogs/2021-2022/summary/Bernardo-Silva-Match-Logs',
    '/en/players/bd6351cd/matchlogs/2021-2022/summary/Joao-Cancelo-Match-Logs',
    '/en/players/31c69ef1/matchlogs/2021-2022/summary/Ruben-Dias-Match-Logs',
    '/en/players/6434f10d/matchlogs/2021-2022/summary/Rodri-Match-Logs',
    '/en/players/119b9a8e/matchlogs/2021-2022/summary/Aymeric-Laporte-Match-Logs']

I tried exploring some of the previous posts, but it did not work out. What can I try next?

2 Answers 2

1

You could check for a condition (whether the link is non-empty and has summary in it):

out = [x for x in links if x and 'summary' in x]

Output:

['/en/players/3bb7b8b4/matchlogs/2021-2022/summary/Ederson-Match-Logs',
 '/en/players/3eb22ec9/matchlogs/2021-2022/summary/Bernardo-Silva-Match-Logs',
 '/en/players/bd6351cd/matchlogs/2021-2022/summary/Joao-Cancelo-Match-Logs',
 '/en/players/31c69ef1/matchlogs/2021-2022/summary/Ruben-Dias-Match-Logs',
 '/en/players/6434f10d/matchlogs/2021-2022/summary/Rodri-Match-Logs',
...
 '/en/players/02aed921/matchlogs/2021-2022/summary/Cieran-Slicker-Match-Logs',
 '/en/players/c19a2df1/matchlogs/2021-2022/summary/Josh-Wilson-Esbrand-Match-Logs']
Sign up to request clarification or add additional context in comments.

Comments

1

An alternative approach to filter your list in the end would be to select your targets more specific and filter from beginning - Following list comprehension selects only the <a> with summary in it and concat it with your baseUrl:

['https://fbref.com'+e['href'] for e in soup.select('a[href*="summary"]')] 

Example

import urllib

from bs4 import BeautifulSoup as bs    
url ='https://fbref.com/en/squads/b8fd03ef/Manchester-City-Stats'
html_page = urllib.request.urlopen(url)
soup = bs(html_page, "html.parser")
    
summaryUrls = ['https://fbref.com'+e['href'] for e in soup.select('a[href*="summary"]')]
print(summaryUrls)
Output
['https://fbref.com/en/players/3bb7b8b4/matchlogs/2021-2022/summary/Ederson-Match-Logs',
 'https://fbref.com/en/players/3eb22ec9/matchlogs/2021-2022/summary/Bernardo-Silva-Match-Logs',
 'https://fbref.com/en/players/bd6351cd/matchlogs/2021-2022/summary/Joao-Cancelo-Match-Logs',
 'https://fbref.com/en/players/31c69ef1/matchlogs/2021-2022/summary/Ruben-Dias-Match-Logs',
 'https://fbref.com/en/players/6434f10d/matchlogs/2021-2022/summary/Rodri-Match-Logs',
 'https://fbref.com/en/players/119b9a8e/matchlogs/2021-2022/summary/Aymeric-Laporte-Match-Logs',
 'https://fbref.com/en/players/ed1e53f3/matchlogs/2021-2022/summary/Phil-Foden-Match-Logs',
 'https://fbref.com/en/players/86dd77d1/matchlogs/2021-2022/summary/Kyle-Walker-Match-Logs',
 'https://fbref.com/en/players/b400bde0/matchlogs/2021-2022/summary/Raheem-Sterling-Match-Logs',
 'https://fbref.com/en/players/e46012d4/matchlogs/2021-2022/summary/Kevin-De-Bruyne-Match-Logs',
 'https://fbref.com/en/players/b0b4fd3e/matchlogs/2021-2022/summary/Jack-Grealish-Match-Logs',
 'https://fbref.com/en/players/819b3158/matchlogs/2021-2022/summary/Ilkay-Gundogan-Match-Logs',
 'https://fbref.com/en/players/b66315ae/matchlogs/2021-2022/summary/Gabriel-Jesus-Match-Logs',
 'https://fbref.com/en/players/892d5bb1/matchlogs/2021-2022/summary/Riyad-Mahrez-Match-Logs',
 'https://fbref.com/en/players/5eecec3d/matchlogs/2021-2022/summary/John-Stones-Match-Logs',...]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.