0

I just trying to run through a list of url link, but it keep showing run time error'91',object variable or with block variable not set.

The data I want to extract is from iframes. It do shown some of the values but it stuck in the middle of process with error.

Below is the sample url link that I want to extract value from:http://www.bursamalaysia.com/market/listed-companies/company-announcements/5927201

Public Sub GetInfo()
    Dim IE As New InternetExplorer As Object
    With IE
        .Visible = False

        For u = 2 To 100

        .navigate Cells(u, 1).Value

        While .Busy Or .readyState < 4: DoEvents: Wend



        With .document.getElementById("bm_ann_detail_iframe").contentDocument
            ThisWorkbook.Worksheets("Sheet1").Cells(u, 3) = .getElementById("main").innerText
            ThisWorkbook.Worksheets("Sheet1").Cells(u, 4) = .getElementsByClassName("company_name")(0).innerText
            ThisWorkbook.Worksheets("Sheet1").Cells(u, 5) = .getElementsByClassName("formContentData")(0).innerText
            ThisWorkbook.Worksheets("Sheet1").Cells(u, 6) = .getElementsByClassName("formContentData")(5).innerText
            ThisWorkbook.Worksheets("Sheet1").Cells(u, 7) = .getElementsByClassName("formContentData")(7).innerText
            ThisWorkbook.Worksheets("Sheet1").Cells(u, 8) = .getElementsByClassName("formContentData")(8).innerText
            ThisWorkbook.Worksheets("Sheet1").Cells(u, 9) = .getElementsByClassName("formContentData")(9).innerText
            ThisWorkbook.Worksheets("Sheet1").Cells(u, 10) = .getElementsByClassName("formContentData")(10).innerText
            ThisWorkbook.Worksheets("Sheet1").Cells(u, 11) = .getElementsByClassName("formContentData")(11).innerText
       End With

    Next u
    End With
End Sub
11
  • On which line is the error occuring and is there a particular URL that is failing? Can you provide some of these URLs? Commented Sep 29, 2018 at 13:55
  • This line is incorrect: Dim IE As New InternetExplorer As Object It should be either Dim IE As New InternetExplorer or Dim IE As Object. As you don't use a Set statement to instantiate the object I am guessing you are auto-instantiating with early bound version i.e. Dim IE As New InternetExplorer Commented Sep 29, 2018 at 13:57
  • Your error is because the last index for that class via the iframe is 9 i.e. ThisWorkbook.Worksheets("Sheet1").cells(u, 9) = .getElementsByClassName("formContentData")(9).innerText . 10 and 11 are invalid. Commented Sep 29, 2018 at 14:05
  • What are the exact values you want please from the page? Supplying some sample URLs (working and failing) will help give some indication if there are the same number of elements with that class in the iframe across each URL. If all fail then it may simply be wrong indexing. Commented Sep 29, 2018 at 14:33
  • Hi QHarr, thank you for your reply. Firstly I am trying to extract all the link from the webpage and filter out all the specific link that I need. Using the url link that I filtered out, I would like to extract some specific data from the iframes. The code above refer to my second step of extracting the data from iframes. But now I am facing some error during my first step. I would open another new thread before solving these question. Really appreciate for your reply. Thank Commented Sep 29, 2018 at 14:34

1 Answer 1

1

tl;dr

Your error is due to the fact there are different numbers of elements for the given class name depending on the results per page. So you can't used fixed indexes. For the page you indicated the last index for that class, via the iframe, is 9 i.e. ThisWorkbook.Worksheets("Sheet1").cells(u, 9) = .getElementsByClassName("formContentData")(9).innerText . 10 and 11 are invalid. Below I show a way to determine the number of results and extract info from each result row.

General principle:

Ok... so the following works on the principle of targeting the Details of Changes table for most of the info.

Example extract:

More specifically, I target the rows that repeat the info for No, Date of Change, #Securities, Type of Transaction and Nature of Interest. These values are stored in an array of arrays (one array per row of information). Then the results arrays are stored in a collection to later be written out to the sheet. I loop each table cell in the targeted rows (td tag elements within parent tr) to populate the arrays.

I add in the Name from the table above on the page and also, because there can be more than one row of results, depending on the webpage, and because I am writing the results to a new Results sheet, I add in the URL before each result to indicate source of information.


TODO:

  1. Refactor the code to be more modular
  2. Potentially add in some error handling

CSS selectors:


① I select the Name element, which I refer to as title, from the Particulars of substantial Securities Holder table.

Example name element:

enter image description here

Inspecting the HTML for this element shows it has a class of formContentLabel, and that it is the first class with this value on the page.

Example HTML for target Name:

enter image description here

This means I can use a class selector , .formContentLabel, to target the element. As it is a single element I want I use the querySelector method to apply the CSS selector.


② I target the rows of interest in the Details of Changes table with a selector combination of .ven_table tr. This is descendant selector combination combining selecting elements with tr tag having parent with class ven_table. As these are multiple elements I use the querySelectorAll method to apply the CSS selector combination.

Example of a target row:

enter image description here


Example results returned by CSS selector (sample):

enter image description here

The rows I am interested start at 1 and repeat every + 4 rows after e.g. row 5 , 9 etc. So I use a little maths in the code to return just the rows of interest:

Set currentRow = data.item(i * 4 + 1)

VBA:

Option Explicit
Public Sub GetInfo()
    Dim IE As New InternetExplorer, headers(), u As Long, resultCollection As Collection
    headers = Array("URL", "Name", "No", "Date of change", "# Securities", "Type of Transaction", "Nature of Interest")
    Set resultCollection = New Collection
    Dim links()
    links = Application.Transpose(ThisWorkbook.Worksheets("Sheet1").Range("A2:A3")) 'A100

    With IE
        .Visible = True

        For u = LBound(links) To UBound(links)
            If InStr(links(u), "http") > 0 Then
                .navigate links(u)

                While .Busy Or .readyState < 4: DoEvents: Wend
                Application.Wait Now + TimeSerial(0, 0, 1) '<you may not always need this. Or may need to increase.
                Dim data As Object, title As Object
                With .document.getElementById("bm_ann_detail_iframe").contentDocument
                    Set title = .querySelector(".formContentData")
                    Set data = .querySelectorAll(".ven_table tr")
                End With

                Dim results(), numberOfRows As Long, i As Long, currentRow As Object, td As Object, c As Long, r As Long

                numberOfRows = Round(data.Length / 4, 0)
                ReDim results(1 To numberOfRows, 1 To 7)

                For i = 0 To numberOfRows - 1
                    r = i + 1
                    results(r, 1) = links(u): results(r, 2) = title.innerText
                    Set currentRow = data.item(i * 4 + 1)
                    c = 3
                    For Each td In currentRow.getElementsByTagName("td")
                        results(r, c) = Replace$(td.innerText, "document.write(rownum++);", vbNullString)
                        c = c + 1
                    Next td
                Next i
                resultCollection.Add results
                Set data = Nothing: Set title = Nothing
            End If
        Next u
        .Quit
    End With
    Dim ws As Worksheet, item As Long
    If Not resultCollection.Count > 0 Then Exit Sub

    If Not Evaluate("ISREF('Results'!A1)") Then '<==Credit to @Rory for this test
        Set ws = Worksheets.Add
        ws.NAME = "Results"
    Else
        Set ws = ThisWorkbook.Worksheets("Results")
        ws.cells.Clear
    End If

    Dim outputRow As Long: outputRow = 2
    With ws
        .cells(1, 1).Resize(1, UBound(headers) + 1) = headers
        For item = 1 To resultCollection.Count
            Dim arr()
            arr = resultCollection(item)
            For i = LBound(arr, 1) To UBound(arr, 1)
                .cells(outputRow, 1).Resize(1, 7) = Application.WorksheetFunction.Index(arr, i, 0)
                outputRow = outputRow + 1
            Next
        Next
    End With
End Sub

Example results using 2 provided tests URLs:

enter image description here


Sample URLs in sheet1:

  1. http://www.bursamalaysia.com/market/listed-companies/company-announcements/5928057
  2. http://www.bursamalaysia.com/market/listed-companies/company-announcements/5927201
Sign up to request clarification or add additional context in comments.

14 Comments

Hi QHarr, I facing "run time error -2147467259(80004005):automation error unspecified error", I am using the sample url and put it onto RANGE(a2:A3)
But if i just input one url onto cell A2, the code run well
Strangely I now get that when I didn't before but only if I run without a pause. If I pause at the error and press run again it is fine. Give me a second to explore.
ok, I will recheck through the code and learn from your template. Appreciate for your solution.Thank
ok.No problem. Thank for the remind. Just trying to contribute some to make this forum great! :)
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.