0

How do I find elements by class name without repeating the output? I have two class to scrape hdrlnk and results-price. I wrote the code like this:

x = driver.find_elements_by_class_name(['hdrlnk','result-price'])

and it gives me some error. I have another code that I tried and here it is:

x = driver.find_elements_by_class_name('hdrlnk'),
y = driver.find_elements_by_class_name('result-price')
for xs in x:
    for ys in y:
        print(xs.text + ys.text)   

But I got the result like this

sony 5 disc cd changer$40
sony 5 disc cd changer$70
sony 5 disc cd changer$70
sony 5 disc cd changer$190
sony 5 disc cd changer$190
sony 5 disc cd changer$190
sony 5 disc cd changer$190
sony 5 disc cd changer$10

The part of the HTML structure that I am trying to scrape

<p class="result-info">
    <span class="icon icon-star" role="button" title="save this post in your favorites list">
        <span class="screen-reader-text">favorite this post</span>
    </span>
    <time class="result-date" datetime="2019-11-07 18:20" title="Thu 07 Nov 06:20:56 PM">Nov  7</time>
    <a href="https://vancouver.craigslist.org/rch/ele/d/chandeliers/7015824686.html" data-id="7015824686" class="result-title hdrlnk">CHANDELIERS</a>
    <span class="result-meta">
        <span class="result-price">$800</span>
        <span class="result-hood"> (Richmond)</span>
        <span class="result-tags">
            <span class="pictag">pic</span>
        </span>
        <span class="banish icon icon-trash" role="button">
            <span class="screen-reader-text">hide this posting</span>
        </span>
        <span class="unbanish icon icon-trash red" role="button" aria-hidden="true"></span>
        <a href="#" class="restore-link">
            <span class="restore-narrow-text">restore</span>
            <span class="restore-wide-text">restore this posting</span>
        </a>
    </span>
</p>

The first element is repeated but I got the correct value for the second one. How do I correct this error?

6
  • Can you add the HTML code you are trying to scrape? Commented Nov 8, 2019 at 3:13
  • Ok sir wait a bit Commented Nov 8, 2019 at 3:14
  • Does the HTML have many of these <p class="result-info"> elements? And each of them contains a hdrlnk and a result-price? Commented Nov 8, 2019 at 5:11
  • yes sir. youre right @Code-Apprentice Commented Nov 8, 2019 at 5:13
  • @Vince See two options in my answer below. Commented Nov 8, 2019 at 5:18

4 Answers 4

5

.find_elements_by_class_name() only takes a single class name. What I would suggest is using a CSS selector to do this job, e.g. .hdrlnk .result-price. The code would look like

prices = driver.find_elements_by_css_selector('.hdrlnk .result-price')

This prints all the prices. If you also want the labels, you will have to write a little more code.

for heading in driver.find_elements_by_css_selector('.hdrlnk'):
    print(heading.text)
    for price in heading.find_elements_by_xpath('./following::span[@class="result-price"]'):
        print('  ' + price.text)

See the docs for all the options to find elements.

CSS selector references:
W3C reference
Selenium Tips: CSS Selectors
Taming Advanced CSS Selectors

Sign up to request clarification or add additional context in comments.

3 Comments

I second this approach.
x here is a little challenging to iterate over since you need to take two elements at a time. I'm not saying it can't be done, but it requires some tricks from the itertools recipes.
@Code-Apprentice Yeah I posted the answer and then went back and reread the question and realized from what OP was printing he might want something other than what I thought based on his original locators. I've since updated my answer to also include this second approach.
3

I think you don't need nested loop, try your iteration by object length, utilize len method:

x = driver.find_elements_by_class_name('hdrlnk'),
#y = driver.find_elements_by_class_name('result-price')
y = driver.find_elements_by_xpath('//p[@class="result-info"]/span[@class="result-meta"]//span[@class="result-price"]')

print(len(x))
print(len(y))

for i in range(len(x)) :
    print(x[i].text + y[i].text)

UPDATE

Actually I just imagine you want to couple member x with member y, it will looks like this:

x[0] with y[0]
x[1] with y[1]
etc....

So I'm sure you having same number between x and y. Because of that reason I think, I just need x to represent loop (although, also you can use y instead).

If you want to include both of them in the loop, you can use zip. Please learn from other answers in this thread.

For xpath you can see here: Locator Strategies

With copy xpath from inspect element it will give you absolute path. I don't recommend it, because it is very vulnerable to change.

Please see this thread: Absolute vs Relative Xpath

6 Comments

It works for me sir. Thanks a lot. Could you explain to me deeper why you only use x in the len loop. And why does the answers below doesn't work for me?
Ohw sir it has some drawbacks in it. It kinda works but the price is repeated twice befor it is updated. Its like this ipad (6th gen) 32gb wifi+cellular BNIB , $425 drag 2 platinum with smok tfv12 tank , $425 Brand New Lightning Cable , $100 drag 2 platinum price was getting the price of the ipad which is $425 instead of $100.
@Vince With the above code, I assume you have same number of x and y. So if you facing repeated twice price, it possible you have y more than x. I've updated the code, please try again and how did it go, I've changed the locator by xpath and for make it will print the length of both lists first.
Now this works for me. But I have a question, why did you used xpath instead of classname? And in your loop statement why you only used x variable and not y Im a little bit confused of it
how did you come up with that xpath sir? I tried copying the xpath from inspect element and it get me an xpath like this //*[@id="sortable-results"]/ul/li[1]/p/span[2]/span[1]
|
2

It looks like you have elements with classes hdrlnk and result-price that come in pairs. So you need to iterate the lists in parallel with zip():

xs = driver.find_elements_by_class_name('hdrlnk'),
ys = driver.find_elements_by_class_name('result-price')
for x, y in zip(xs, ys):
    print(x.text, y.text)

This assumes that the two lists contain the same number of elements in the correct order so that they match up correctly with zip(). It is probably safer to parse them directly from the HTML by iterating over the parent <p> elements:

ps = driver.find_elements_by_class_name('result-info')
for p in ps:
    x = p.find_element_by_class_name('hdrlnk'),
    y = p.find_element_by_class_name('result-price')
    print(x.text, y.text)

5 Comments

I tried the first one and got me an errof of 'list' object has no attribute 'text' the error is in the print(x.text, y.text). I tried to modify it by print(xs.text, ys.text) And got an error of "message": "Instance of 'tuple' has no 'text' member"
The second one got also an error of 'tuple' object has no attribute 'text' And in my terminal I got output like this [15768:7180:1108/132750.647:ERROR:page_load_metrics_update_dispatcher.cc(166)] Invalid first_paint 2.392 s for first_image_paint 2.388 s. What should I do sir?
@Vince You appear to be doing something slightly different than what I have here. I don't see how the code I gave can give any of those errors.
@Vince If you need more help, post a new question with your current code and its errors.
@Vince I found a mistake in my second example. I don't think the change affects the errors you are seeing, but it does fix a logic error.
1

If your usecase is to use find_elements_by _classname() a better approach would be to to induce WebDriverWait for the visibility_of_all_elements_located() and you can use either of the following Locator Strategies:

  • Using CLASS_NAME:

    items = WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CLASS_NAME, "hdrlnk")))
    prices = WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CLASS_NAME, "result-price")))
    for i,j in zip(items, prices):
        print(i.text + j.text)
    

However a canonical approach will be to use either of the following:

  • CSS_SELECTOR:

    items = WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "p.result-info a.hdrlnk")))
    prices = WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "p.result-info span.result-meta>span.result-price")))
    for i,j in zip(items, prices):
        print(i.text + j.text)
    
  • XPATH:

    items = WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//p[@class='result-info']//a[contains(@class, 'hdrlnk')]")))
    items = WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//p[@class='result-info']//span[@class='result-meta']/span[@class='result-price']")))
    for i,j in zip(items, prices):
        print(i.text + j.text)
    
  • Note : You have to add the following imports :

    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    

2 Comments

Got some error in importing sir. Like unable to import selenium.webdriver.support
@Vince import should work irrespective of the underlying code block. Restart your IDE. If the error still persists you may have to reinstall selenium.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.