0

I have seen this question posted before, but my situation is somewhat different so I was hoping I could get some help from the community and maybe a fresh perspective. I have a macro written in vba that's supposed to pull return data from this company's online db, things like returns for MSCI World Index, S&P 500, etc. The code I have works on other pages, but I think this one is different. I talked to the webmaster and he told me that the code was not designed to be scraped, but that is not restricted by their usage policy. It would be a huge time saver for me if I could, indeed, get the data by scraping so I'm trying really hard to figure out a way to do it. I've tagged this under java-script as well, as I think the code would be very similar and I want to accept as many solutions as possible to solve this problem.

The situation is this: I have the following code that throws me an "Object variable not set" error when it comes to the actual scraping of the data (the line that begins 'set els = htmlDoc...." I've tried many combinations of the getElement(s) function thinking that may have been the problem, but I've drawn a blank. Anybody know any other ways to set the object variable in this environment? Or just any other creative ways to pull the data.

I can't give out the login info, but I think by just navigating to the 'caRetPage' site, you can see the html code that I'm trying to scrape/parse.

Sub caScrape()

Dim ie As Object        'ie: internet explorer
Dim htmlDoc As MSHTML.HTMLDocument
Dim els As Object   'to store html objects
Dim rtn As String   'to store values to be scraped from page
Dim loginButton As Object
caLoginPage = "https://members.cambridgeassociates.com/Login/Forms/login-form.asp"
caRetPage = "https://members.cambridgeassociates.com/markets/marketindexsnapshot/DailyMarketReturnsUS.asp"
caUser = "xxxxx"
caPass = "xxxxx"
Set ie = CreateObject("internetexplorer.application")
ie.Visible = True
ie.navigate caLoginPage
While ie.Busy
    DoEvents
Wend
Do Until ie.readyState = 4
    DoEvents
Loop
Set htmlDoc = ie.document
'Log in to site
Set loginButton = htmlDoc.getElementsByTagName("button").Item(0)
With htmlDoc
    .all("Username").Value = caUser
    .all("Password").Value = caPass
    loginButton.Click
End With
While ie.Busy
    DoEvents
Wend
Set acceptButton = htmlDoc.getElementsByName("Submit").Item(0)
acceptButton.Click
While ie.Busy
    DoEvents
Wend


'Here is the page with the return data on it                                                                                   
ie.navigate caRetPage
While ie.Busy
    DoEvents
Wend
Do Until ie.readyState = 4
    DoEvents
Loop
Set htmlDoc = ie.document

'This next line is where the error gets thrown
Set els = htmlDoc.getElementById("tblData")(0).getElementByTagName("tr")(5).getElementByTagName("td")(1)
    'Also tried the following and plenty of variations of getElement command
'Set els = htmlDoc.getElementsByTagName("body")(0).getElementsByTagName("table")(2).getElementsByTagName("tbody")(0).getElementByTagName("tr")(5).getElementByTagName("td")(1)

rtn = els.innerText
Debug.Print(rtn)


End Sub

Any help would be greatly appreciated.

5
  • When I get an error in a long expression, I break it up into smaller pieces. You'll discover which segment of the expression is breaking that way. Commented Feb 1, 2017 at 16:33
  • 1
    getElementById("tblData") will always return a single element (assuming a match was made), not a collection/list, so you don't need the (0) following that. Commented Feb 1, 2017 at 17:09
  • @SMeaden Thanks for showing up... You helped with this the last time. I don't think it's a matter of it being a long expression. It's a matter of the site won't let me set the variables. Have you ever heard of this? This code (or a very similar version) runs perfectly on other pages; could this be a firewall issue, if so, any thoughts on circumventing? Commented Feb 1, 2017 at 17:17
  • Thanks @TimWilliams; I've also tried the code two lines down where I go through and narrow down the search by tag names. Still I have errors. Also, if I try to set els = htmlDoc.getElementByID("tblData") and leave the rest of the line blank, it throws error. Any ideas? Commented Feb 1, 2017 at 17:23
  • getElementByTagName is not a valid method: it's getElementsByTagName (though you have it correct in the commented-out lines). It's impossible to really help here, without the HTML source or an accessible URL> Commented Feb 1, 2017 at 18:36

2 Answers 2

0

I believe I have a working example with the URL provided in the code posted above. From what I can tell there are frames on the website, so you need to handle these slightly differently.

In addition, waiting for the page to load needed a different method. For that I reused some code I got some another StackOverflow answer. Basically it waits until it doesn't find anymore tags on a page as it is loading.

Here's the code:

#If VBA7 Then
    Public Declare PtrSafe Sub Sleep Lib "kernel32" (ByVal dwMilliseconds As LongPtr)
#Else
    Public Declare Sub Sleep Lib "kernel32" (ByVal dwMilliseconds As Long)
#End If

Sub getData()
    Dim element     As Object
    Dim MyURL       As String
    MyURL = "https://members.cambridgeassociates.com/markets/marketindexsnapshot/DailyMarketReturnsUS.asp"

    'Late binding open IE
    Dim MyBrowser   As Object: Set MyBrowser = CreateObject("InternetExplorer.Application")
    MyBrowser.Visible = True
    MyBrowser.navigate MyURL

    waitforload MyBrowser

    Set element = MyBrowser.document.getelementsByTagName("Frameset")(0).Children(1).contentdocument
    Set element = element.getelementByID("tblData")
    Set element = element.getelementsByTagName("tr")(5)
    Set element = element.getelementsByTagName("td")(1)
    Debug.Print element.innertext
End Sub

Private Sub waitforload(ByRef ie As Object)
    Dim i        As Byte
    Dim tagnames As Long

    While ie.Busy
        Sleep 250
        DoEvents
    Wend

    While ie.ReadyState <> 4
        Sleep 250
        DoEvents
    Wend

    Do
        tagnames = ie.document.getelementsByTagName("*").Length
        For i = 1 To 5
            Sleep 75
            If tagnames = ie.document.getelementsByTagName("*").Length Then Exit Sub
        Next
    Loop
End Sub

This should return: 0.10

Sign up to request clarification or add additional context in comments.

2 Comments

Whoa!!! That is awesome! Thanks a million! Sorry I was out sick for a couple of days and wasn't responding. This is great, thanks Ryan!
How did you figure that out?
0

Further to Tim's eagle eye, here is some documentation. The error (as Tim spotted) is that getElementById does not return a list of HTML elements but a single element.

Here is a documentation link

https://msdn.microsoft.com/en-us/library/ms536437(v=vs.85).aspx

2 Comments

Ok... I've got that the getElementByID command only returns a single element, but I've tried other methods too (see commented code two lines down from 'ID' command) I'm still unable to set the variable using this method as well. Any thoughts on what could be the error in the alternative code? I'm still thinking firewall. Do you know anything about that?
Difficult for us to debug if we do not have username and password. Have you tried querySelector? See this stackoverflow.com/questions/41947882/… Can you somehow show the HTML page source?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.