I am new to web scraping and need to scrape some data from the website for my research: https://www.promedmail.org/.
What I coded was
- Get on the site
- Click on the search tab
- Type in a keyword (ebola)
- Click search to populate Search Results
- Click on the first link to populate a preview on the right panel
However, on #5 I can't click the link even though I successfully obtained the <a> tag using the article ID. The error message says:
selenium.common.exceptions.ElementNotInteractableException: Message: Element <a id="id6519943" class="lcl" href="javascript:;"> could not be scrolled into view
After some research, I figured that I would need to scroll to the link because the link was not visible. I tried 5 different solutions suggested in stackoverflow, but none of them really worked for me and I got stuck. They are listed in the below code and commented out.
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.common.action_chains import ActionChains
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
class WebScraper:
"""Custome web scraper"""
def __init__(self, url, keyword):
self.url = url
self.keyword = keyword
self.search_results = []
self.ariticle_ids = []
def get_all_data(self):
"""Get beautiful soup objects for all articles"""
driver = webdriver.Firefox()
driver.get(self.url)
driver.find_element_by_id('search_tab').click()
driver.find_element_by_id('searchterm').send_keys(self.keyword)
driver.find_element_by_css_selector('#searchby_other > input[type=submit]').click()
element_article_id = driver.find_element_by_css_selector('#search_results > ul')
source_article_id = element_article_id.get_attribute('outerHTML')
soup_article_id = BeautifulSoup(source_article_id, 'html.parser')
tag_a = soup_article_id.select('ul > li > a[id]')
for i in range(len(tag_a)):
self.ariticle_ids.append(tag_a[i].get('id'))
element_link = driver.find_element_by_id(self.ariticle_ids[0])
# driver.execute_script("arguments[0].scrollIntoView();", element_link)
# driver.execute_script("window.scrollBy(0, -150);")
# element_link.location_once_scrolled_into_view
# ActionChains(driver).move_to_element(driver.find_element_by_id(self.ariticle_ids[0])).perform()
# WebDriverWait(driver, 1000000).until(EC.element_to_be_clickable((By.ID, self.ariticle_ids[0]))).click()
element_link.click()
if __name__ == "__main__":
url = 'https://www.promedmail.org/'
keyword = 'ebola'
webscraper = WebScraper(url, keyword)
webscraper.get_all_data()
When the link is clicked, a preview will pop up on the right panel. I am planning to scrape the article and move down to the next link.