10

I'm writing my first data scraper using Excel and VBA. I'm stuck trying to go to the next page of a website. The source code looks as follows:

<li><a href="#" onclick="changePage(2); return false;">Page 2 of 24</a></li>

This is the VBA code I have but does not seem to work:

For Each l In ie.Document.getElementsByTagName("a")
    If l.href = "#" And l.onclick = "changePage(2); return false;" Then
        l.Item(2).Click
        Exit For
    End If
Next l

When I run the code I don't get any errors, but it doesn't seem to go to page 2. Keep in mind that there are more pages after page 2. My idea is replace "2" with a variable later and increase that variable by one. But I need to get it to work first.

Thanks to whoever can help.

5
  • Just checking but have you tried Navigate or Navigate2 instead of the Click method? I can't wait to try this tomorrow! Commented Feb 19, 2016 at 3:31
  • No, I'm not familiar with Navigate. Do you have an example? I've programmed in VBA before but this is my first time trying to evoke Web clicks/events though VBA. Commented Feb 19, 2016 at 22:26
  • Here's documentation on the Navigate method and I'll try to find some sample code later: msdn.microsoft.com/en-us/library/aa752093.aspx Commented Feb 20, 2016 at 13:14
  • Rick, thanks. I'll research the navigate method. Assuming I can use it, do you think my conditional statement correct given the html code? If you can provide examples that would help a lot. Commented Feb 20, 2016 at 14:28
  • It looks like it could work. I'll need a bit more of your sample code but I'll see if I can put it together. Based on some other code see this sample that uses XMLHTTP instead of the Browser control: github.com/rickhenderson/Web-Scraping-With-VBA/blob/master/… as well as this other StackOverFlow question: stackoverflow.com/questions/26128056/… Commented Feb 21, 2016 at 3:04

3 Answers 3

2

[Edit: I now have a solution and the code has been replaced. -RDH]

First I want to mention that if the data retrieved in this manner is used for commercial purposes or anything other than personal use then it violates 2 sections of the Kelley Blue Book (kbb.com) Terms of Service.

FYI: Sites that collect, update, and maintain data like BlueBook or the MLS take their data very seriously, and they don't like people scraping it. I was speaking to an old classmate of mine who has her degree in Computer Science and is now a real estate agent, and I mentioned to her about how cool it is to be able scrape housing data off of MLS and she nearly flipped out on me. Just saying: people were paid to create that data and people make their lives using that data. 'Nuff said. I was able to get the problem code running by creating a web page on my own server that had the same format you were looking for since I get a different version of the bluebook.com site since I am in Canada. I get redirected to kbb.com.

+++ The real problem +++

The problem is that hrefs with an # symbol are actually the full URL with the # attached to the end, and when you check the onClick event it actually contains the full function declariation, so you have to only search for partial strings.

' A good idea to declare the proper datatypes
' because IHTMLElement has the click event but IHTMLAnchorElements don't
Dim l As IHTMLElement
Dim htmlanchors As IHTMLElementCollection
' ...

Set htmlanchors = ie.Document.getElementsByTagName("a")

' Look through all the anchor tags on the page
    For Each l In htmlanchors
       ' Check to see the Href contains a # and the onclick event has specific code
        If InStr(l.href, "#") And InStr(l.onclick, "changePage(3); return false;") Then
            ' Click the current anchor link
            l.Click
            Exit For
        End If
Next l
Sign up to request clarification or add additional context in comments.

7 Comments

I will however attempt to get this to work and retrieve info from multiple pages.
Rick - my code works on my version of Excel and with the US version of Bluebook. The only thing I can't seem to do is get it to click on page 2 of 7, page 3 of 7, etc.
In other words, my problem was not with the search. My problem was with the page change.
Sorry for the long-winded of the answers but I at least needed something to work with since we are seeing different sites. Since the code l.onclick is not capitalized there is a chance it's not recognized by VBA so it's either the wrong method or less is the wrong object type. Have you tried clicking Debug > Compile to see if it throws any other errors?
RIck - with a very minor change your code worked. I needed to add "ie" in Set htmlanchors = ie.Document.getElementsByTagName("a"). Now I just need to look for the last page of every instance and loop through them. I marked the answer as useful. I wasn't sure if that was the same as marking it correct. Thanks a lot for all of your help.
|
0

Have you tried

.FireEvent ("onclick")
Or
.FireEvent ("onmouseover")
.FireEvent ("onmousedown")
.FireEvent("onmouseup")

in place of .click? Sometimes the JavaScript actions don't respond to .click.

1 Comment

I haven't tried that method but I will as soon as I get home.
0

Rick – below is my entire code. I’m basically trying to scrape www.the bluebook.com.

Sub ScrapeData()

Dim ie As InternetExplorer
Dim ele As Object
Dim RowCount As Long
Dim myWebsite As String, mySearch1 As String, mySearch2 As String, mySearch3 As String
Dim Document As HTMLDocument

myWebsite = Range("Website").Value
mySearch1 = Range("search1").Value
mySearch2 = Range("search2").Value
mySearch3 = Range("search3").Value

Set mySheet = Sheets("Sheet1")
Range("A6").Value = "Company"
Range("B6").Value = "Address"
Range("C6").Value = "Contact"

RowCount = 7
Set ie = New InternetExplorer
ie.Visible = True
With ie
.Visible = True
.navigate (myWebsite)

Do While .Busy Or .readyState <> 4
    DoEvents
Loop

ie.Document.getElementById("search").Value = mySearch1
ie.Document.getElementById("selRegion").Value = mySearch2
ie.Document.getElementsByClassName("searchBtn")(0).Click

Do While .Busy Or _
    .readyState <> 4
    DoEvents
Loop

For Each ele In .Document.all
    Select Case ele.className
    Case "result_title"
    RowCount = RowCount + 1
    Case "cname"
    mySheet.Range("A" & RowCount) = ele.innerText
    Case "addy_wrapper"
    mySheet.Range("B" & RowCount) = ele.innerText
    End Select
Next ele
End With

'THIS IS THE CODE THAT IS NOT WORKING
For Each l In ie.Document.getElementsByTagName("a")
    If l.href = "#" And l.onclick = "changePage(3); return false;" Then
        l.Item(3).Click
        Exit For
    End If
Next l

Set ie = Nothing
End Sub

8 Comments

I'll try and check it out tomorrow but my 5yo is sick so I won't get much computer time. Looks like good code and you're just missing something small. Honestly I don't do Web scraping but I feel like after getting the page text the rest is just string manipulation but I'm sure that's over simplifying it.
Rick - as I said before, any help is appreciated. But definitely take care of that 5yo first. My stuff can wait until he's better. Thanks.
I'm about to spend an hour on it now, but I'll probably waste my day on it tomorrow. I hate it when I can't solve a VBA problem. Though this is most likely a DOM problem. Do you have to log into the site to retrive the data? What values can I use for search1, search2, and search3?
No login necessary. I've only been using search1 (companies) and search2 (region). For testing purposes I've been setting Search1= "Building Maintenance Contractors", Search2="New Jersey-North". The results should give you 7 pages. I've been unable to go beyond page 1.
I believe I'm getting a different site because I'm in Canada. I get redirected to kbb.com and the HTML elements are quite different. I was about to post a partial solution when I saw your last post. For instance when I just use Search1 in the main search box at the bottom of the screen, I only get 2 Ford trucks.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.