2

I have written a VBA which opens a web link using selenium chrome web driver to scrape of data and I got several problems with which I need your advice on guys.

Code example and outcome 1: On error actived

    Sub test_supplements_store()
    Dim driver As New ChromeDriver
    Dim post As Object

    i = 1

    driver.Get "https://www.thesupplementstore.co.uk/brands/optimum_nutrition?page=4"
On Error Resume Next
    For Each post In driver.FindElementsByClass("desc")
        Cells(i, 1) = post.FindElementByTag("a").Attribute("title")
        Cells(i, 2) = Trim(Split(post.FindElementByClass("size").Text, ":")(1))
        Cells(i, 3) = post.FindElementByXPath(".//span[@class='now']//span[@class='pricetype-purchase-unit multi-price']//span[@class='blu-price blu-price-initialised']").Text
        Cells(i, 4) = post.FindElementByTag("a").Attribute("href")
        i = i + 1
    Next post
End Sub

enter image description here

Code example and outcome 2: On error deactivated

Sub test_supplements_store()
    Dim driver As New ChromeDriver
    Dim post As Object

    i = 1

    driver.Get "https://www.thesupplementstore.co.uk/brands/optimum_nutrition?page=4"
'On Error Resume Next
    For Each post In driver.FindElementsByClass("desc")
        Cells(i, 1) = post.FindElementByTag("a").Attribute("title")
        Cells(i, 2) = Trim(Split(post.FindElementByClass("size").Text, ":")(1))
        Cells(i, 3) = post.FindElementByXPath(".//span[@class='now']//span[@class='pricetype-purchase-unit multi-price']//span[@class='blu-price blu-price-initialised']").Text
        Cells(i, 4) = post.FindElementByTag("a").Attribute("href")
        i = i + 1
    Next post
End Sub

enter image description here Code example and outcome 3: On error activated

Sub test_supplements_store()
    Dim driver As New ChromeDriver
    Dim post As Object

    i = 1

    driver.Get "https://www.thesupplementstore.co.uk/brands/optimum_nutrition"
On Error Resume Next
    For Each post In driver.FindElementsByClass("desc")
        Cells(i, 1) = post.FindElementByTag("a").Attribute("title")
        Cells(i, 2) = Trim(Split(post.FindElementByClass("size").Text, ":")(1))
        Cells(i, 3) = post.FindElementByXPath(".//span[@class='now']//span[@class='pricetype-purchase-unit multi-price']//span[@class='blu-price blu-price-initialised']").Text
        Cells(i, 4) = post.FindElementByTag("a").Attribute("href")
        i = i + 1
    Next post
End Sub

enter image description here

First example returns all of the 74 items from the website apart from price but in a very long period of time about two minutes.

Second example only returns title into the first cell of the sheet and pops the error out.

Third example returns only 21 but misses to return price of those items that do not have now label. Script runs very quickly, under 10 seconds.

Please advice on how to return all 74 items back together with title, size, price, href.

3
  • What is the exact err you are getting? StaleElement? Commented Aug 7, 2017 at 7:43
  • I'm not sure what you mean by that as the error snapshot is attached to the second example. First and third examples do not return any errors back. Commented Aug 7, 2017 at 7:45
  • 1
    Ok thanks. I haven't worked on VB, but this is the approach i have used to overcome staleness in java . stackoverflow.com/questions/45434381/… Commented Aug 7, 2017 at 8:01

1 Answer 1

1

The page you are dealing with has got lay-loading method applied. This is because all items don't load at a time; rather, it loads the rest when you scroll downmost. I used a small javascript function within the code and it solves the issue. I hope this is what the result you were looking for.

Sub test_supplements_store()
    Dim driver As New ChromeDriver
    Dim post As Object

    driver.Get "https://www.thesupplementstore.co.uk/brands/optimum_nutrition"
    On Error Resume Next

    Do While EndofPage = False
        PrevPageHeight = CurrentPageHeight
        CurrentPageHeight = driver.ExecuteScript("window.scrollTo(0, document.body.scrollHeight);var CurrentPageHeight=document.body.scrollHeight;return CurrentPageHeight;")
        driver.Wait 3000
        If PrevPageHeight = CurrentPageHeight Then
            EndofPage = True
        End If
    Loop

    For Each post In driver.FindElementsByXPath("//li[contains(@class,'prod')]")
        i = i + 1: Cells(i, 1) = post.FindElementByXPath(".//a").Attribute("title")
        Cells(i, 2) = Split(post.FindElementByXPath(".//p[@class='size']").Text, ": ")(1)
        Cells(i, 3) = post.FindElementByXPath(".//p[@class='price']//span[@class='now']//span|.//p[@class='price']//span[@class='dynamictype-single']").Text
        Cells(i, 4) = post.FindElementByXPath(".//a").Attribute("href")
    Next post
End Sub
Sign up to request clarification or add additional context in comments.

10 Comments

You have another requirement that I didn't notice. Using xpath will solve the price issue.
Unfortunately your code returns only prices for 21 first items of the page. Also I'm not sure how to return together regular and new price of the item.
I didn't tune up your price part. I tried to get all 74 items.
Just a moment. Get back to you.
Just follow the link. Very descriptive for a beginner like me and i think you will also find it useful. Btw, don't get lost to see some complicated stuffs cause you might never have any necessity to use them. zvon.org/comp/r/tut-XPath_1.html
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.