I am trying to fetch the links of all news articles related to Apple, using this webpage: https://finance.yahoo.com/quote/AAPL/news?p=AAPL. But there are also a lot of links for advertisements in between and other links guiding to other pages of the website. How do I selectively only fetch links to news articles? Here is the code I have written so far:
driver = webdriver.Chrome(executable_path='C:\\Users\\Home\\OneDrive\\Desktop\\AJ\\chromedriver_win32\\chromedriver.exe')
driver.get("https://finance.yahoo.com/quote/AAPL/news?p=AAPL")
links=[]
for a in driver.find_elements_by_xpath('.//a'):
links.append(a.get_attribute('href'))
def get_info(url):
#send request
response = requests.get(url)
#parse
soup = BeautifulSoup(response.text)
#get information we need
news = soup.find('div', attrs={'class': 'caas-body'}).text
headline = soup.find('h1').text
date = soup.find('time').text
return news, headline, date
Can anyone guide on how to do this or to a resource that can help with this? Thanks!
.//a[starts-with(@href,"/news") or starts-with(@href,"/m")]. You can learn XPath syntax.