Filtering the list of html links using a specific key word using Python

Question

I am trying to extract the links using a specific work in each link in the list of links. Below is the code that I get the URLs:

import urllib

from bs4 import BeautifulSoup as bs    
url ='https://fbref.com/en/squads/b8fd03ef/Manchester-City-Stats'
html_page = urllib.request.urlopen(url)
soup = bs(html_page, "html.parser")
links = []
player_link =[]
for link in soup.findAll('a'):
    links.append(link.get('href'))

From the above lines of code, I can store the list of links in the variable links I want to create a new list containing only the specific word summary. The expected output ( only part of all) that should be stored in a new list player_list is shown below:

 player_list =['/en/players/3bb7b8b4/matchlogs/2021-2022/summary/Ederson-Match-Logs',
    '/en/players/3eb22ec9/matchlogs/2021-2022/summary/Bernardo-Silva-Match-Logs',
    '/en/players/bd6351cd/matchlogs/2021-2022/summary/Joao-Cancelo-Match-Logs',
    '/en/players/31c69ef1/matchlogs/2021-2022/summary/Ruben-Dias-Match-Logs',
    '/en/players/6434f10d/matchlogs/2021-2022/summary/Rodri-Match-Logs',
    '/en/players/119b9a8e/matchlogs/2021-2022/summary/Aymeric-Laporte-Match-Logs']

I tried exploring some of the previous posts, but it did not work out. What can I try next?

user7864386 · Accepted Answer · 2022-03-28 04:49:35Z

1

You could check for a condition (whether the link is non-empty and has summary in it):

out = [x for x in links if x and 'summary' in x]

Output:

['/en/players/3bb7b8b4/matchlogs/2021-2022/summary/Ederson-Match-Logs',
 '/en/players/3eb22ec9/matchlogs/2021-2022/summary/Bernardo-Silva-Match-Logs',
 '/en/players/bd6351cd/matchlogs/2021-2022/summary/Joao-Cancelo-Match-Logs',
 '/en/players/31c69ef1/matchlogs/2021-2022/summary/Ruben-Dias-Match-Logs',
 '/en/players/6434f10d/matchlogs/2021-2022/summary/Rodri-Match-Logs',
...
 '/en/players/02aed921/matchlogs/2021-2022/summary/Cieran-Slicker-Match-Logs',
 '/en/players/c19a2df1/matchlogs/2021-2022/summary/Josh-Wilson-Esbrand-Match-Logs']

answered Mar 28, 2022 at 4:49

user7864386

Sign up to request clarification or add additional context in comments.

Comments

HedgeHog · Accepted Answer · 2022-03-28 06:40:34Z

An alternative approach to filter your list in the end would be to select your targets more specific and filter from beginning - Following list comprehension selects only the <a> with summary in it and concat it with your baseUrl:

['https://fbref.com'+e['href'] for e in soup.select('a[href*="summary"]')]

Example

import urllib

from bs4 import BeautifulSoup as bs    
url ='https://fbref.com/en/squads/b8fd03ef/Manchester-City-Stats'
html_page = urllib.request.urlopen(url)
soup = bs(html_page, "html.parser")
    
summaryUrls = ['https://fbref.com'+e['href'] for e in soup.select('a[href*="summary"]')]
print(summaryUrls)

Output

['https://fbref.com/en/players/3bb7b8b4/matchlogs/2021-2022/summary/Ederson-Match-Logs',
 'https://fbref.com/en/players/3eb22ec9/matchlogs/2021-2022/summary/Bernardo-Silva-Match-Logs',
 'https://fbref.com/en/players/bd6351cd/matchlogs/2021-2022/summary/Joao-Cancelo-Match-Logs',
 'https://fbref.com/en/players/31c69ef1/matchlogs/2021-2022/summary/Ruben-Dias-Match-Logs',
 'https://fbref.com/en/players/6434f10d/matchlogs/2021-2022/summary/Rodri-Match-Logs',
 'https://fbref.com/en/players/119b9a8e/matchlogs/2021-2022/summary/Aymeric-Laporte-Match-Logs',
 'https://fbref.com/en/players/ed1e53f3/matchlogs/2021-2022/summary/Phil-Foden-Match-Logs',
 'https://fbref.com/en/players/86dd77d1/matchlogs/2021-2022/summary/Kyle-Walker-Match-Logs',
 'https://fbref.com/en/players/b400bde0/matchlogs/2021-2022/summary/Raheem-Sterling-Match-Logs',
 'https://fbref.com/en/players/e46012d4/matchlogs/2021-2022/summary/Kevin-De-Bruyne-Match-Logs',
 'https://fbref.com/en/players/b0b4fd3e/matchlogs/2021-2022/summary/Jack-Grealish-Match-Logs',
 'https://fbref.com/en/players/819b3158/matchlogs/2021-2022/summary/Ilkay-Gundogan-Match-Logs',
 'https://fbref.com/en/players/b66315ae/matchlogs/2021-2022/summary/Gabriel-Jesus-Match-Logs',
 'https://fbref.com/en/players/892d5bb1/matchlogs/2021-2022/summary/Riyad-Mahrez-Match-Logs',
 'https://fbref.com/en/players/5eecec3d/matchlogs/2021-2022/summary/John-Stones-Match-Logs',...]

Collectives™ on Stack Overflow

Filtering the list of html links using a specific key word using Python

2 Answers 2

Comments

Example

Output

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Example

Output

Comments

Your Answer

Sign up or log in

Post as a guest

Related