Using XMLHTTP object to parse some websites in VBA

Question

I am trying to pick up "key people" field from a Wikipedia page: https://en.wikipedia.org/wiki/Abbott_Laboratories and to copy that value in my Excel spread sheet.

I managed to do it using xml http which is a method I like for its speed, you can see the code below that is working.

The code is however not flexible enough as the structure of the wiki page can change, for example it doesn't work on this page: https://en.wikipedia.org/wiki/3M

as the tr td structure is not exactly the same (key people is no longer 8th TR for the 3M page)

How can I improve my code?

Public Sub parsehtml()

Dim http As Object, html As New HTMLDocument, topics As Object, titleElem As Object, detailsElem As Object, topic As HTMLHtmlElement
Dim i As Integer

Set http = CreateObject("MSXML2.XMLHTTP")



http.Open "GET", "https://en.wikipedia.org/wiki/Abbott_Laboratories", False

http.send

html.body.innerHTML = http.responseText

Set topic = html.getElementsByTagName("tr")(8)

Set titleElem = topic.getElementsByTagName("td")(0)

ThisWorkbook.Sheets(1).Cells(1, 1).Value = titleElem.innerText

End Sub

Ahmed AU · Accepted Answer · 2019-06-11 05:27:14Z

2

If row of the table is not fixed for "Key people", then why don't loop the table for "Key people"

I tested with followings modification, it is found working correctly.

In declaration section

Dim topics As HTMLTable, Rw As HTMLTableRow

and then finally

html.body.innerHTML = http.responseText
Set topic = html.getElementsByClassName("infobox vcard")(0)

    For Each Rw In topic.Rows
        If Rw.Cells(0).innerText = "Key people" Then
        ThisWorkbook.Sheets(1).Cells(1, 1).Value = Rw.Cells(1).innerText
        Exit For
        End If
    Next

edited Jun 11, 2019 at 5:27

answered Jun 10, 2019 at 6:44

Ahmed AU

2,7772 gold badges9 silver badges15 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

will199 Over a year ago

this works, fantastic thanks!!! to answer to your question I simply did not we could test the "innerText", very new to this. thanks again Ahmed

Ryszard Jędraszyk Over a year ago

@will199 If it answers your question, please mark it as an answer.

QHarr · Accepted Answer · 2019-06-10 08:45:40Z

1

There is a better faster way. At least for given urls. Match on class name of element and index into returned nodeList. Less returned items to deal with, the path to the element is shorter, and matching with class name is faster than matching on element type.

Option Explicit
Public Sub GetKeyPeople()
    Dim html As HTMLDocument, body As String, urls(), i As Long, keyPeople
    Set html = New HTMLDocument
    urls = Array("https://en.wikipedia.org/wiki/Abbott_Laboratories", "https://en.wikipedia.org/wiki/3M")
    With CreateObject("MSXML2.XMLHTTP")
        For i = LBound(urls) To UBound(urls)
            .Open "GET", urls(i), False
            .send
            html.body.innerHTML = .responseText
            keyPeople = html.querySelectorAll(".agent").item(1).innerText
            ThisWorkbook.Worksheets("Sheet1").Cells(i + 1, 1).Value = keyPeople
        Next
    End With
End Sub

answered Jun 10, 2019 at 8:45

QHarr

84.5k14 gold badges58 silver badges105 bronze badges

2 Comments

Ahmed AU Over a year ago

+1, This is really the best direct way. thanks for the enlightenment, i should be open to all the options instead of keeping my mind boggled up with old conventional crude ways..

QHarr Over a year ago

@AhmedAU In the absence of more test cases I would say yours may well prove more robust over time, +, hence my caveat.

Collectives™ on Stack Overflow

Using XMLHTTP object to parse some websites in VBA

2 Answers 2

2 Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related