2

I am trying to pick up "key people" field from a Wikipedia page: https://en.wikipedia.org/wiki/Abbott_Laboratories and to copy that value in my Excel spread sheet.

I managed to do it using xml http which is a method I like for its speed, you can see the code below that is working.

The code is however not flexible enough as the structure of the wiki page can change, for example it doesn't work on this page: https://en.wikipedia.org/wiki/3M

as the tr td structure is not exactly the same (key people is no longer 8th TR for the 3M page)

How can I improve my code?

Public Sub parsehtml()

Dim http As Object, html As New HTMLDocument, topics As Object, titleElem As Object, detailsElem As Object, topic As HTMLHtmlElement
Dim i As Integer

Set http = CreateObject("MSXML2.XMLHTTP")



http.Open "GET", "https://en.wikipedia.org/wiki/Abbott_Laboratories", False

http.send

html.body.innerHTML = http.responseText

Set topic = html.getElementsByTagName("tr")(8)

Set titleElem = topic.getElementsByTagName("td")(0)

ThisWorkbook.Sheets(1).Cells(1, 1).Value = titleElem.innerText

End Sub

2 Answers 2

2

If row of the table is not fixed for "Key people", then why don't loop the table for "Key people"

I tested with followings modification, it is found working correctly.

In declaration section

Dim topics As HTMLTable, Rw As HTMLTableRow

and then finally

html.body.innerHTML = http.responseText
Set topic = html.getElementsByClassName("infobox vcard")(0)

    For Each Rw In topic.Rows
        If Rw.Cells(0).innerText = "Key people" Then
        ThisWorkbook.Sheets(1).Cells(1, 1).Value = Rw.Cells(1).innerText
        Exit For
        End If
    Next
Sign up to request clarification or add additional context in comments.

2 Comments

this works, fantastic thanks!!! to answer to your question I simply did not we could test the "innerText", very new to this. thanks again Ahmed
@will199 If it answers your question, please mark it as an answer.
1

There is a better faster way. At least for given urls. Match on class name of element and index into returned nodeList. Less returned items to deal with, the path to the element is shorter, and matching with class name is faster than matching on element type.

Option Explicit
Public Sub GetKeyPeople()
    Dim html As HTMLDocument, body As String, urls(), i As Long, keyPeople
    Set html = New HTMLDocument
    urls = Array("https://en.wikipedia.org/wiki/Abbott_Laboratories", "https://en.wikipedia.org/wiki/3M")
    With CreateObject("MSXML2.XMLHTTP")
        For i = LBound(urls) To UBound(urls)
            .Open "GET", urls(i), False
            .send
            html.body.innerHTML = .responseText
            keyPeople = html.querySelectorAll(".agent").item(1).innerText
            ThisWorkbook.Worksheets("Sheet1").Cells(i + 1, 1).Value = keyPeople
        Next
    End With
End Sub

2 Comments

+1, This is really the best direct way. thanks for the enlightenment, i should be open to all the options instead of keeping my mind boggled up with old conventional crude ways..
@AhmedAU In the absence of more test cases I would say yours may well prove more robust over time, +, hence my caveat.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.