3

I want to retrieve the href attribute of the <h3> tags of an html page, I am able to get the innerText, but I don't know how to access the href attribute, there are several <h3> tags in the document, but for the time being I just need the first one. I will deal with the rest later...

This is the code I got so far

Sub Scrap()

Dim IE As New InternetExplorer
Dim sDD As String
Dim Doc As HTMLDocument

IE.Visible = True
IE.navigate "https://www.oneoiljobsearch.com/senior-reservoir-engineer-jobs/?page=1"
Do
DoEvents
Loop Until IE.readyState = READYSTATE_COMPLETE
Set Doc = IE.document
sDD = Trim(Doc.getElementsByTagName("h3")(0).innerText) 
'sDD contains the string "Senior Reservoir Engineer"
End Sub

Below is a portion of the HTML document to extract data from:

  <div class="front_job_details">

    <h3>
        <a href="/jobs/senior-reservoir-engineer-oslo-norway-7?cmp=js&from=job-search-form-2" target="_blank">

        Senior Reservoir Engineer

        </a>
    </h3>

The text I need to retrieve is: "/jobs/senior-reservoir-engineer-oslo-norway-7?cmp=js&from=job-search-form-2"

Thanks in advance for your help.

3 Answers 3

2

Try,

dim hr as string

hr = Doc.getElementsByTagName("h3")(0).getElementsByTagName("a")(0).href

debug.print hr

The getElementsByTagName collection is zero-based but the .Length (the # of H3's, called Count in other methods) is one-based.

dim i as long

for i=0 to Doc.getElementsByTagName("h3").length - 1
    debug.print Doc.getElementsByTagName("h3")(i).getElementsByTagName("a")(0).href
next i

This gets the first <A> tag from each H3. You could duplicate the method to get multiple A's from each H3.

Sign up to request clarification or add additional context in comments.

1 Comment

Now, how can I loop through all <h3> tags to get all the href in the document? I need to define a collection of some sort, but not sure how, could you please help?
1

I would go with the following more robust CSS selector method to grab all the hrefs within the class

Option Explicit
Public Sub GetLinks()
    Dim ie As New InternetExplorer, i As Long, aNodeList As Object
    With ie
        .Visible = True
        .navigate "https://www.oneoiljobsearch.com/senior-reservoir-engineer-jobs/?page=1"

        While .Busy Or .readyState < 4: DoEvents: Wend

        Set aNodeList = .document.querySelectorAll(".front_job_details [href]")
        For i = 0 To aNodeList.Length - 1
            Debug.Print aNodeList.item(i)
        Next
        Stop                                     '<=delete me after
        'other stuff
        .Quit
    End With
End Sub

Comments

0

Below the final code, in case it helps somebody...

Sub MultiScrap()

Dim IE As New InternetExplorer
Dim hr As String
Dim Doc As HTMLDocument
Dim i, j, s As Long

Sheets("LNK0").Activate
myHTTP = Cells(1, 2) 'http address root
lval = Cells(2, 2) 'min number to add to root (page=1..)
uval = Cells(3, 2) 'max number to add to root (page=10..)
s = 5

For i = lval To uval 'loop through all pages

    'IE.Visible = True
    IE.navigate myHTTP & i
    Do
    DoEvents
    Loop Until IE.readyState = READYSTATE_COMPLETE
    Set Doc = IE.document

    For j = 0 To Doc.getElementsByTagName("h3").Length - 1
        Cells(s, 1) = s - 4 'Correl
        Cells(s, 2) = i 'Page
        Cells(s, 3) = j 'Row in page
        Cells(s, 4) = Doc.getElementsByTagName("h3")(j).getElementsByTagName("a")(0).href 'Http
        hyperAddres = Cells(s, 4).Value
        hyperTxt = Cells(s, 4).Value
        Cells(s, 4).Hyperlinks.Add _
            Anchor:=Range(Cells(s, 4), Cells(s, 4)), _
            Address:=hyperAddres, _
            TextToDisplay:=hyperTxt 'Hyperlink
        s = s + 1
    Next j
    Doc.Close
Next i

MsgBox "Dishes ready Sir!"

End Sub

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.