0

I am trying to fetch the links of all news articles related to Apple, using this webpage: https://finance.yahoo.com/quote/AAPL/news?p=AAPL. But there are also a lot of links for advertisements in between and other links guiding to other pages of the website. How do I selectively only fetch links to news articles? Here is the code I have written so far:

driver = webdriver.Chrome(executable_path='C:\\Users\\Home\\OneDrive\\Desktop\\AJ\\chromedriver_win32\\chromedriver.exe')
driver.get("https://finance.yahoo.com/quote/AAPL/news?p=AAPL")
links=[]
for a in driver.find_elements_by_xpath('.//a'):
    links.append(a.get_attribute('href'))

def get_info(url):
    #send request   
    response = requests.get(url)
    #parse    
    soup = BeautifulSoup(response.text)
    #get information we need
    news = soup.find('div', attrs={'class': 'caas-body'}).text
    headline = soup.find('h1').text 
    date = soup.find('time').text
    return news, headline, date

Can anyone guide on how to do this or to a resource that can help with this? Thanks!

1
  • 1
    May be useful: .//a[starts-with(@href,"/news") or starts-with(@href,"/m")]. You can learn XPath syntax. Commented Sep 18, 2021 at 7:31

1 Answer 1

1

Try this xpath to get all the news links from that page.

//li[contains(@class,'js-stream-content')]/div[@data-test-locator='mega']//h3/a
driver.implicitly_wait(10)
driver.maximize_window()

driver.get("https://finance.yahoo.com/quote/AAPL/news?p=AAPL")
time.sleep(10)
links = driver.find_elements_by_xpath("//li[contains(@class,'js-stream-content')]/div[@data-test-locator='mega']//h3/a")
for link in links:
    print(link.get_attribute("href"))
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.