1

I am attempting to scrape values from a collection of webpages using an XPath to parse what I want out of the XML. I grab the full XPath from the element using Chrome but then when I use in the code it doesnt seem to select the node I am looking for. Also when I execute the XPath statement in the console it also does not return the node. Some other element XPaths work in console but not in VBA. Am I missing something? My simple test XML works ok. My attempts to use namespace in the XPath were also not successful. Code below with an example of one of the webpages and one of the elements of interest:

Sub test()

testXML = "<test example='hello'>hello</test>"

Dim oXMLHTTP As Object
Dim sPageHTML  As String
Dim sURL As String
Dim XmlResponse As String
Dim strXML As String
Dim xNode As MSXML2.IXMLDOMNode
Dim xmlElement As MSXML2.IXMLDOMElement
Dim XDoc As MSXML2.DOMDocument60

sURL = "https://www.bestplaces.net/crime/zip-code/alaska/anchorage/99510"

Set oXMLHTTP = CreateObject("MSXML2.ServerXMLHTTP")
oXMLHTTP.SetOption(2) = 13056 'Disable CA messages
oXMLHTTP.Open "GET", sURL, False
oXMLHTTP.send
XmlResponse = oXMLHTTP.responseText

'strXML = testXML
strXML = XmlResponse

Set XDoc = New MSXML2.DOMDocument60
'XDoc.setProperty "SelectionNamespaces", "xmlns:a='http://www.w3.org/1999/xhtml'"
'XDoc.setProperty "SelectionNamespaces", "xmlns:a='http://www.w3.org/2000/svg'"
'XDoc.setProperty "SelectionNamespaces", "xmlns:a='http://www.w3.org/1999/xlink'"

XDoc.LoadXML (strXML)

'Set xNode = XDoc.SelectSingleNode("/test")
Set xNode = XDoc.SelectSingleNode("/html/body/form/div[7]/div[2]/div[2]/div[3]/div/div/div/div/div/svg/g[6]/g[1]/text/tspan[2]")

If xNode Is Nothing Then
    MsgBox "Nothing"
Else: MsgBox xNode.text
End If

End Sub
2
  • Are you getting html or xml back? Your path would seem to indicate html rather than xml in which case use an html parser for starters. Have you visually inspected the response? Commented Jan 7, 2021 at 16:07
  • @QHarr I thought I had previously but looking at it now it is not XML... not sure how to get XML, I thought this is what the code I am using was supposed to be doing... Commented Jan 7, 2021 at 16:19

1 Answer 1

1

You are getting html back. A quick look at the page source shows that value is populated dynamically, but should be available by regex out of responseText; so your xpath wouldn't work even if converted to equivalent path for html parser.

Option Explicit

Public Sub GetValue()
    Dim http As Object, s As String, re As Object

    Set http = CreateObject("MSXML2.XMLHTTP")
    Set re = CreateObject("VBScript.RegExp")
    
    With http
        .Open "GET", "https://www.bestplaces.net/crime/zip-code/alaska/anchorage/99510", False
        .setRequestHeader "User-Agent", "Mozilla/5.0"
        .send
        s = .responseText
    End With
    With re
        .Pattern = "data:\s?\[(.*?),"
        Debug.Print .Execute(s)(0).SubMatches(0)
    End With

End Sub

Regex explanation:

enter image description here

Sign up to request clarification or add additional context in comments.

2 Comments

Thanks, this seems to work and definitely puts me on the right track!
can you please explain how this statement work.. Debug.Print .Execute(s)(0).SubMatches(0)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.