This code giving me duplicate URLs, how do I filter them
sg = []
for url in soup.find_all('a', attrs={'href': re.compile("^https://www.somewebsite")}):
print(url['href'])
sg.append(url['href'])
print(sg)
This code giving me duplicate URLs, how do I filter them
sg = []
for url in soup.find_all('a', attrs={'href': re.compile("^https://www.somewebsite")}):
print(url['href'])
sg.append(url['href'])
print(sg)
Instead of a list, using a set would solve the issue.
sg = set()
for url in soup.find_all('a', attrs={'href': re.compile("^https://www.somewebsite")}):
print(url['href'])
sg.add(url['href'])
print(sg)