0

I have this html structure:

<table class="series">
<tr>
    <th>1.</th>
    <td><div><a href="?id=12">1st part</a></div></td>
</tr>
<tr>
    <th>2.</th>
    <td><div><a href="?id=13">2nd part</a></div></td>
</tr>
<tr>
    <th>3.</th>
    <td><div><a href="?id=14">3rd part</a></div></td>
</tr>
<tr>
    <th>4.</th>
    <td><div><a href="?id=15">4th part</a></div></td>
</tr>
<tr>
    <th>5.</th>
    <td><b>5th part</b></td>
</tr>
<tr>
    <th>6.</th>
    <td><div><a href="?id=16">6th part</a></div></td>
</tr>

and what I need is to get url of href always previous part im at. As you can see, in this html im at 5th part, so i need to return "?id=15" as previous part of where im actualy am. The thing is, that current part could be anywhere 1st (in this case, return would be NULL), 2nd, 10th, 50th. Or even all this class could be missing on the page, so return should be also NULL.

my progress is this:

 Sub GetPageTitle()
    Set IE = CreateObject("InternetExplorer.Application")
    With IE
        .Visible = True
        .Navigate "www.something.com/do.php?id=5"
        Do Until .ReadyState = 4
            DoEvents
        Loop

        'Series
        Set my_data = .Document.getElementsByClassName("series")
        Dim link
        Dim i As Long: i = 2
        For Each elem In my_data
            Set link = elem.getElementsByTagName("a")(0)

            'copy the data to the excel sheet
            Debug.Print link.href
            Debug.Print link.innerText
            i = i + 1
        Next

        .Quit
    End With
End Sub
  • this was in order just to debug each href in the element, however the for loop doesnt work, it always debugs just once...

I have of course tried to replace (0) by some automatic increment i, but no change. And also, if I figure the debug, I still at this moment have no idea to get the result of url of previous part whatsoever :(

3
  • The code is iterating though all the series classes, as these seem to belong to only a table, this would make sense. You aren't walking through all elements in the table, only, all the elements that have series as there class. Try replacing Set my_data = .Document.getElementsByClassName("series") with Set my_data = .Document.getElementsByClassName("series").getElementsByTagName("a") and walking through that collection instead. Commented Apr 13, 2020 at 11:09
  • when I replace the line with my_data, then getting: Run-time error '438': Object doesn't support this property or method Commented Apr 13, 2020 at 11:15
  • My bad, I missed selecting an element in the ClassName collection, try this. set my_data = .Document.getElementsByClassName("series")(0).getElementsByTagName("a"). This will iterate all a tags in the first tag with a class of series Commented Apr 13, 2020 at 11:16

2 Answers 2

1

Using the following assumptions

no class 'series' then null (or better to use table.series ?) With just series assumption is it only occurs for table element

if no bold in table then null

else iterate rows using counter 

if bold found:
    if row = 0  then 
        null 
    else test row - 1  (prior row)
        if has single href attribute then href else null

you could write a select case statement and a surrogate htmldocument variable to apply each test within a helper function

Option Explicit

Public Sub PrintPriorHref()
    Dim html As MSHTML.HTMLDocument, table As MSHTML.HTMLTable, ie As SHDocVw.InternetExplorer

    Set ie = New SHDocVw.InternetExplorer
    Set html = New MSHTML.HTMLDocument

    With ie
        .Visible = True
        .navigate "www.something.com/do.php?id=5"
        Do: DoEvents: Loop While .Busy Or .readyState <> READYSTATE_COMPLETE

        html.body.innerHTML = .document.body.innerHTML

        Set table = html.querySelector(".series")
        Debug.Print GetPriorHref(table)
        .Quit
    End With

End Sub

Public Function GetPriorHref(ByVal table As MSHTML.HTMLTable) As Variant
    Dim i As Long, html As MSHTML.HTMLDocument

    Set html = New MSHTML.HTMLDocument

    Select Case True
    Case table Is Nothing
        GetPriorHref = Null ' "Null"
    Case table.getElementsByTagName("b").Length <> 1
        GetPriorHref = Null
    Case Else
        Dim r As MSHTML.HTMLTableRow

        For Each r In table.rows
            html.body.innerHTML = r.outerHTML

            If html.querySelectorAll("b").Length > 0 Then
                Select Case i
                Case 0
                    GetPriorHref = Null '"Null"
                Case Is > 0
                    Dim anchorList As Object

                    html.body.innerHTML = table.rows(i - 1).outerHTML
                    Set anchorList = html.querySelectorAll("[href]")

                    If anchorList.Length <> 1 Then
                        GetPriorHref = Null ' "Null"
                    Else
                        GetPriorHref = anchorList(0).href
                    End If
                End Select
            End If
            i = i + 1
        Next
    End Select
End Function

Required references (VBE > Tools > References):

  1. Microsoft Internet Controls
  2. Microsoft HTML Object Library
Sign up to request clarification or add additional context in comments.

Comments

0

If there are multiple series tables and you want all the links, you need to loop through all the series (which you already did), then loop through all the links in each series like this

Set my_data = .Document.getElementsByClassName("series")
Dim all_links, link
Dim i As Long: i = 2
For Each elem In my_data
    Set all_links = elem.getElementsByTagName("a")
    For Each link In all_links
        'copy the data to the excel sheet
        Debug.Print link.href
        Debug.Print link.innerText
        i = i + 1
    Next
Next

1 Comment

this works great! But to be honest, at the moment I still have no adia how to achive outcome I have in initial question, to return NULL if the class doesnt exist, NULL of part1 has no href, or return href of one <tr> above the current tr (its the one which has no href). Can I kindly ask you to help with that too?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.