2

I need to extract COCOA London Dec20 and Mar21 close price from following website https://www.mrci.com/ohlc/ohlc-all.php Screenshot for the website

I wrote the following code for this but it's throwing errors, please help:

Sub extract()

Dim appIE As Object

Set appIE = CreateObject("internetexplorer.application")

With appIE

    .Navigate "https://www.mrci.com/ohlc/ohlc-all.php"

    .Visible = False

End With

Do While appIE.Busy

    DoEvents

Loop

Set allRowOfData = appIE.document.getElementsByClassName("strat").getElementsByTagName("tbody")(183)

Dim myValue As String: myValue = allRowOfData.Cells(5).innerHTML

appIE.Quit

Set appIE = Nothing

lastrow = Sheets("Sheet2").Cells(Rows.Count, "A").End(xlUp).Row + 1

Range("A" & lastrow).Value = myValue

End Sub
  • The code is trowing following error:

Run-Time Error - 438 Object doesn't support the property or method.

3
  • On which line is the error? Commented Nov 29, 2019 at 18:11
  • Thank you so much for the support. However, my system is throwing "user-defined type not defined" error when using HTMLDocument. Do I need to change some settings? Commented Nov 30, 2019 at 9:48
  • You need to add the project reference as detailed at bottom of my answer Commented Nov 30, 2019 at 9:49

2 Answers 2

2

Your error is because you are calling a document/node method on a collection.

Set allRowOfData = appIE.document.getElementsByClassName("strat").getElementsByTagName("tbody")(183)

getElementsByClassName("strat") returns a collection which you need to index into to then use the method getElementsByTagName.

E.g.

Set allRowOfData = appIE.document.getElementsByClassName("strat")(0).getElementsByTagName("tbody")(183)

Summary approach:

As the data for different futures is arranged in a long series of table row (tr) nodes, one needs to determine the right block of trs by the leading future header (th), check the subsequent sibling trs for the right mmmyy and then extract the right column (td) value from the row. One needs to stop at the start of the next future block, or the end of the sibling trs; whichever comes first.


tl;dr;

The HTML is not ideal for identifying quickly the right trs; and the more indices you use, the more fragile the program. The following does have assumptions but is more robust.

I use an XMLHTTP request as the expense of using a browser is not necessary. You have a variable for the future of interest and all header ths are collected, by className using a css class selector, into a nodeList which is looped until the target future is found. This places you at the start of the first row of interest. The various mmmyy are then in subsequent rows. I determine the appropriate column index by using a helper function to examine the table headers and compare against a defined target header name. The function returns the appropriate index where header found or -1 if not found.

Now, I have a dictionary which holds the mmmyy periods of interest. I loop all the tds until I hit the next section (i.e. the next tr which has a header (th) as its FirstChild). I check each FirstChild in a row and if the mmmyy value found is in the dictionary I update the dictionary with the appropriate column value.

In the loop I am working at a lower level than HTMLDocument so, in order to leverage querySelectorAll, I dump the current nextNode.NextSibling.OuterHTML into a surrogate HTMLDocument variable; I then have access to querySelectorAll again, and can select the appropriate td by index. I need to wrap the html transferred in <TABLE><TD></TABLE> tags for the HTML parser to not complain and to get the right # td elements. Pretty sure there is scope for tightening this up which I may do if I have time to revisit this.

At the end I write out the dictionary values to the sheet.


VBA:

Option Explicit

Public Sub GetCocoaClosePrices()
    Dim html As MSHTML.HTMLDocument, targetPeriods As Object, targetFuture As String, targetColumnName As String

    targetFuture = "London Cocoa(LCE)"
    targetColumnName = "Close"
    Set targetPeriods = CreateObject("Scripting.Dictionary")
    Set html = New MSHTML.HTMLDocument

    targetPeriods.Add "Dec20", "Not found"
    targetPeriods.Add "Mar21", "Not found"

    With CreateObject("MSXML2.XMLHTTP")
        .Open "GET", "https://www.mrci.com/ohlc/ohlc-all.php", False
        .Send
        html.body.innerHTML = .responseText
    End With

    Dim tableHeaders As Object, targetColumnNumber As Long

    Set tableHeaders = html.querySelectorAll("tr ~ tr ~ tr .colhead")
    targetColumnNumber = GetTargetColumnNumber(tableHeaders, targetColumnName)

    If targetColumnNumber = -1 Then Exit Sub

    Set targetPeriods = GetUpdatedDictionary(targetPeriods, html, targetFuture, targetColumnNumber)

    With ThisWorkbook.Worksheets(1)
        .Cells(1, 1).Resize(1, targetPeriods.Count) = targetPeriods.keys
        .Cells(2, 1).Resize(1, targetPeriods.Count) = targetPeriods.items
    End With
End Sub

Public Function GetUpdatedDictionary(ByRef targetPeriods As Object, ByVal html As HTMLDocument, ByVal targetFuture As String, ByVal targetColumnNumber As Long) As Object
    Dim html2 As MSHTML.HTMLDocument, firstChild As Object, i As Long
    Dim nextNode As Object, headerNodes As Object

    Set headerNodes = html.querySelectorAll(".note1")
    Set html2 = New MSHTML.HTMLDocument

    For i = 0 To headerNodes.Length - 1
        If headerNodes.Item(i).innerText = targetFuture Then 'find the right target future header
            Set nextNode = headerNodes.Item(i).ParentNode 'move up to the parent tr node

            Do 'walk the adjacent tr nodes
                Set nextNode = nextNode.NextSibling
                Set firstChild = nextNode.firstChild

                If nextNode Is Nothing Then
                    Set GetUpdatedDictionary = targetPeriods
                    Exit Function 'exit if no next section
                End If

                html2.body.innerHTML = "<TABLE><TD>" & nextNode.outerHTML & "</TABLE>"

                If targetPeriods.Exists(firstChild.innerText) Then
                    targetPeriods(firstChild.innerText) = html2.querySelectorAll("td").Item(targetColumnNumber).innerText
                End If
            Loop While firstChild.tagName <> "TH" 'stop at next section i present

        End If
    Next
    Set GetUpdatedDictionary = targetPeriods
End Function

Public Function GetTargetColumnNumber(ByVal nodeList As Object, ByVal targetColumnName As String) As Long
    Dim i As Long

    For i = 0 To nodeList.Length - 1
        If nodeList.Item(i).innerText = targetColumnName Then
            GetTargetColumnNumber = i + 1 'to account for th
            Exit Function
        End If
    Next
    GetTargetColumnNumber = -1
End Function

Reading:

  1. css selectors
  2. document.querySelectorAll
  3. Node.nextSibling
  4. Node.parentNode

References (VBE>Tools>References):

  1. Microsoft HTML Object Library
Sign up to request clarification or add additional context in comments.

4 Comments

It works great! I was thinking about answering it yesterday, but it took me more than 5 minutes and I have abandoned it! :)
@Vityata I’d be interested in seeing another answer. I’m afraid I am getting stuck in my ways and missing out on other ways of solving these things. Also, would appreciate feedback on how to improve my answers. I was helping an OP in chat the other day and realised I tend to give less lengthy explanations in answers now-a-days.
@Vityata And I forgot to say thank you for the edits! Appreciated.
You are welcome:) I have written my version, it bases on the code of the OP. Pretty much this is what I was planning to write yesterday.
1

As mentioned in the @QHarr's answer, the problem of the code is that a document/mode method is called upon a collection. Anyway, following the approach of the OP, and working as much as possible with their code, this is the logic I have created:

  • Open the URL and go to the website
  • Get all rows with allRowsOfData = appIE.document.getElementsByClassName("strat")
  • Start looping through the .Children of allROwsOfData
  • Start an internal loop through the .Children of the .Children, aka "grandchildren" :)
  • For every "grandchild", take a look whether it is the one with the "London Cocoa(LCE)" in the innerText.
  • If it is, make found = True, and in the next few rows we would find the data for Dec20 and Mar21 and we would write it down.
  • Another check is needed, to make sure that we do not overwrite the data with the next table - If InStr(1, child2.outerhtml, "th class=") And Not CBool(InStr(1, child2.outerhtml, target)). Thus, if the child2.outerhtml contains "th class=" and it is not the one from the London Cocoa(LCE), then found = False.
  • If the found is True, start looping through the dictionary with the periods and check the 5th column for its value. This is not very flexible, as far as if the design changes, it will be wrong.
  • At the end we write from the dictionary to the Excel worksheet

The code:

Sub TestMe()

    Dim appIE As Object
    Set appIE = CreateObject("InternetExplorer.Application")

    With appIE
        .Navigate "https://www.mrci.com/ohlc/ohlc-all.php"
        .Visible = True
    End With

    Do While appIE.Busy: DoEvents: Loop

    Dim allRowsOfData As Variant
    allRowsOfData = appIE.document.getElementsByClassName("strat")

    Dim found As Boolean: found = False
    Dim target As String: target = "London Cocoa(LCE)"

    Dim targetPeriods As Object
    Set targetPeriods = CreateObject("Scripting.Dictionary")
    targetPeriods.Add "Dec20", "Not found"
    targetPeriods.Add "Mar21", "Not found"

    Dim child As Variant
    Dim child2 As Variant
    Dim myKey As Variant

    For Each child In allRowsOfData.Children
        For Each child2 In child.Children

            If InStr(1, child2.innerText, target) Then found = True
            If InStr(1, child2.outerhtml, "th class=") And _
                            Not CBool(InStr(1, child2.outerhtml, target)) Then
                found = False
            End If

            If found Then
                For Each myKey In targetPeriods.keys
                    If Left$(child2.innerText, Len(myKey)) = myKey Then
                        targetPeriods(myKey) = child2.Children(5).innerText
                        Debug.Print child2.Children(5).innerText
                    End If
                Next
            End If
        Next
    Next

    Dim i As Long: i = 1
    For Each myKey In targetPeriods
        With Worksheets(1)
            .Cells(i, 1) = myKey
            .Cells(i, 2) = targetPeriods(myKey)
            i = i + 1
        End With
    Next

    appIE.Quit

End Sub

Bonus - the whole code, split into subs and functions, at my website - https://www.vitoshacademy.com/vba-extracting-financial-data-from-a-website-in-table-format/

2 Comments

you get a boost for using For Each with a collection. Nice. Would get slight faster comparison for typed functions e.g. Left$.
@QHarr - thanks. I somehow have not used the $ functions for the couple of years, but indeed they are faster.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.