1

I want to extract text "3351500920037" from the following code:

<div class="specs">
    <h3 class="h4">Productinformatie</h3>
    <dl class="specs__list">

        <dt class="specs__title">
        Gewicht

      </dt>
        <dd class="specs__value">

            0,3 kg

        </dd>

        <dt class="specs__title">
        EAN

      </dt>
        <dd class="specs__value">

            3351500920037

        </dd>

    </dl>
</div>

I use

ref_code = driver.find_element_by_xpath('//*[contains(text(),"EAN")]/following-sibling::dd').text

When I print ref_code seems taking the first line of the text only. It appears empty.

What I have:

print(ref_code)

I would like to have:

print(ref_code)
3351500920037

How can I take the whole text including next lines?

3
  • 2
    Add result of printing ref_code and your expectations. Add please HTML in text format. Commented Sep 28, 2019 at 7:40
  • no buddy will write html structure from the image for you unless you do. please add the html structure as code in post so that people can reproduce the problem Commented Sep 28, 2019 at 9:29
  • Sorry about that. I just edited my question without images. Thanks Commented Sep 28, 2019 at 10:56

2 Answers 2

2

Here is code how you can get all EAN numbers from first search page. You can improve code by go through all pages first to collect all links:

import selenium, csv, sys, time
from oauth2client.service_account import ServiceAccountCredentials
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.chrome.options import Options

driver = webdriver.Chrome('/usr/local/bin/chromedriver')
wait = WebDriverWait(driver, 20)

query = "Azzaro Chrome 100 ml"
driver.get("https://www.bol.com")

driver.find_element_by_id("searchfor").send_keys(query, u'\ue007')

# wait presence and get all product A elements
products = wait.until(ec.presence_of_all_elements_located((By.CSS_SELECTOR, "li.product-item--row a.product-title")))
# get HREF attribute from products
product_links = [product.get_attribute("href") for product in products]

# iterate through and open all product links, and get ref_code
for link in product_links:
    driver.get(link)
    ref_code = driver.find_element_by_css_selector("a[data-ean]").get_attribute("data-ean")
    print(ref_code)
Sign up to request clarification or add additional context in comments.

5 Comments

Thanks, but I need it in Python only
Except I'm mistaken, I've got the same issue as my initial code I got the following error: AttributeError: 'visibility_of_element_located' object has no attribute 'text'
All you need is open browser and navigate to the URL, then just copy my code without any modifications from you side. If still you'll get error, update you question with your full code
My navigation to the URL is perfect, then, I use exactly your code, and I got the following error: selenium.common.exceptions.TimeoutException: Message:
1

The item is Not visible on the page that is why visibility_of_element_located() is getting timeout exception.

To extract text 3351500920037 you need to induce WebDriverWait and presence_of_element_located() and get_attribute('textContent') it will gives the result you are looking for.

print(WebDriverWait(driver,20).until(EC.presence_of_element_located((By.XPATH, "//*[contains(.,'EAN')]/following-sibling::dd[1]"))).get_attribute('textContent'))

This is the full code:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

driver = webdriver.Chrome()
driver.get("https://www.bol.com/")
query='Azzaro Chrome 100 ml'
searchelement=WebDriverWait(driver,20).until(EC.element_to_be_clickable((By.ID,"searchfor")))
searchelement.send_keys(query)
searchelement.submit()
WebDriverWait(driver,20).until(EC.element_to_be_clickable((By.CSS_SELECTOR,".product-title.px_list_page_product_click"))).click()
print(WebDriverWait(driver,20).until(EC.presence_of_element_located((By.XPATH, "//*[contains(.,'EAN')]/following-sibling::dd[1]"))).get_attribute('textContent'))
driver.quit()

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.