1

I am new to selenium and I want to scrape data from https://www.nasdaq.com/market-activity/stocks/aapl I am particularly interested in data from Summary Data section.

As an example, I want to scrape the following data:

  1. Exchange: NASDAQ-GS
  2. Sector: Technology
  3. Industry: Computer Manufacturing

Here is the part of HTML code from the table that I want to extract:

<table class="summary-data__table" role="table">
  <thead class="visually-hidden" role="rowgroup">
    <tr role="row">
      <th role="columnheader" scope="col">Label</th>
      <th role="columnheader" scope="col">Value</th>
    </tr>
  </thead>
  <tbody class="summary-data__table-body" role="rowgroup"><tr class="summary-data__row" role="row" data-first-five="true" data-first-ten="true">
      <td role="cell" class="summary-data__cellheading">Exchange</td><td role="cell" class="summary-data__cell">NASDAQ-GS</td>
    </tr><tr class="summary-data__row" role="row" data-first-five="true" data-first-ten="true">
      <td role="cell" class="summary-data__cellheading">Sector</td><td role="cell" class="summary-data__cell">Technology</td>
    </tr><tr class="summary-data__row" role="row" data-first-five="true" data-first-ten="true">
      <td role="cell" class="summary-data__cellheading">Industry</td><td role="cell" class="summary-data__cell">Computer Manufacturing</td>
    </tr><tr class="summary-data__row" role="row" data-first-five="true" data-first-ten="true">
      <td role="cell" class="summary-data__cellheading">1 Year Target</td><td role="cell" class="summary-data__cell">$275.00</td>
    </tr><tr class="summary-data__row" role="row" data-first-five="true" data-first-ten="true">
      <td role="cell" class="summary-data__cellheading">Today's High/Low</td><td role="cell" class="summary-data__cell">$271.00/$267.30</td>
    </tr><tr class="summary-data__row" role="row" data-first-ten="true">
      <td role="cell" class="summary-data__cellheading">Share Volume</td><td role="cell" class="summary-data__cell">26,547,493</td>
    </tr></tbody>
</table>

This is the Python code that I have so far:

driver = webdriver.Chrome(executable_path='chromedriver.exe')
driver.get('https://www.nasdaq.com/market-activity/stocks/aapl')
time.sleep(20)

elements = driver.find_element_by_class_name("summary-data__table")

I am stuck as I can't iterate through the table using the code above.

1
  • Welcome to Stack Overflow. The issue is that your selector is only selecting an element that is showing up once. If you are looking to gather everything in the summary data table, you can do something like this: driver.find_elements_by_css_selector(".summary-data__table .summary-data__row") Commented Dec 9, 2019 at 0:24

3 Answers 3

1

Your code uses find_element_by_class_name which will only return one element and needs one class name. You should use find_elements_by_css_selector. This will select all elements and do it with a more specific CSS query. You can read more here if you are interested.

Change your code to this: elements = driver.find_elements_by_css_selector(".summary-data__table .summary-data__row")

This will go to all rows within the summary data row.

From there, you will be able to loop through all elements and do a subquery (key / value of each).

Sign up to request clarification or add additional context in comments.

Comments

1

To scrape the NASDAQ-GS, Technology and Computer Manufacturing fields you need to scrollIntoView() the desired elements and then induce WebDriverWait for the visibility_of_element_located() and you can use either of the following Locator Strategies:

  • Using CSS_SELECTOR:

    from selenium import webdriver
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.support import expected_conditions as EC
    
    driver.get("https://www.nasdaq.com/market-activity/stocks/aapl")
    driver.execute_script("return arguments[0].scrollIntoView(true);", WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "div.summary-data__header>h2.module-header"))))
    print(WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "tbody.summary-data__table-body>tr td:nth-child(2)"))).text)
    print(WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "tbody.summary-data__table-body>tr:nth-child(2) td:nth-child(2)"))).text)
    print(WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "tbody.summary-data__table-body>tr:nth-child(3) td:nth-child(2)"))).text)
    driver.quit()
    
  • Using XPATH:

    from selenium import webdriver
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.support import expected_conditions as EC
    
    driver.get("https://www.nasdaq.com/market-activity/stocks/aapl")
    driver.execute_script("return arguments[0].scrollIntoView(true);", WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "div.summary-data__header>h2.module-header"))))
    print(WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.XPATH, "//tbody[@class='summary-data__table-body']/tr//following-sibling::td[2]"))).get_attribute("innerHTML"))
    print(WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.XPATH, "//tbody[@class='summary-data__table-body']//following-sibling::tr[1]//following-sibling::td[2]"))).get_attribute("innerHTML"))
    print(WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.XPATH, "//tbody[@class='summary-data__table-body']//following-sibling::tr[2]//following-sibling::td[2]"))).get_attribute("innerHTML"))
    
  • Console Output:

    NASDAQ-GS
    Technology
    Computer Manufacturing
    

Comments

0
import requests


r = requests.get(
    'https://api.nasdaq.com/api/quote/AAPL/summary?assetclass=stocks').json()

for key, value in r['data']['summaryData'].items():
    print("{:<20} {}".format(key, value['value']))
Exchange             NASDAQ-GS
Sector               Technology
Industry             Computer Manufacturing
OneYrTarget          $275.00
TodayHighLow         $271.00/$267.30
ShareVolume          26,547,493
AverageVolume        24,634,815
PreviousClose        $265.58
FiftTwoWeekHighLow   $268.25/$142.00
MarketCap            1,202,836,268,150
PERatio              22.84
ForwardPE1Yr         20.15
EarningsPerShare     $11.85
AnnualizedDividend   $3.08
ExDividendDate       Nov 7, 2019
DividendPaymentDate  Nov 14, 2019
Yield                1.17669%
Beta                 1.02

5 Comments

Your response doesn't answer the question. Although an API is better to retrieve information 9 times out of 10, OP might require to use Selenium based on their project requirements. Also, this JSON response doesn't have fields that is in the summary data that OP is looking for.
@Lewis maybe you need to run the code to figure out the output ?
Your post was edited since my comment, despite this, your answer is efficient but doesn't answer the question.
@Lewis my post edited after your comment for the location of data. review the past edit to see it's the same link and same details . only i did accessed the dict. Anyway the opinion is based on OP.
@αԋɱҽԃαмєяιcαη How did you find the API link so quickly? I spent many hours looking for it but couldn't find any! I searched in Google and skimmed through the website many times. Do you have any trick for finding API quickly?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.