2

I am trying to extract all the hyperlinks which contains"http://www.bursamalaysia.com/market/listed-companies/company-announcements/" from the webpages I input.

Firstly, the code ran well but after then I am facing the problems which I could not extract the url link that I needed. It just missing every time i run the sub.enter image description here

Link:http://www.bursamalaysia.com/market/listed-companies/company-announcements/#/?category=SH&sub_category=all&alphabetical=All

Sub scrapeHyperlinks()

    Dim IE As InternetExplorer
    Dim html As HTMLDocument
    Dim ElementCol As Object
    Dim Link As Object
    Dim erow As Long
    Application.ScreenUpdating = False
    Set IE = New InternetExplorer


    For u = 1 To 50
    IE.Visible = False
    IE.navigate Cells(u, 2).Value
    Do While IE.readyState <> READYSTATE_COMPLETE
    Application.StatusBar = "Trying to go to websitehahaha"
    DoEvents

    Loop
    Set html = IE.document
    Set ElementCol = html.getElementsByTagName("a")
    For Each Link In ElementCol
    erow = Worksheets("Sheet1").Cells(Rows.Count, 1).End(xlUp).Offset(1, 0).Row
    Cells(erow, 1).Value = Link
    Cells(erow, 1).Columns.AutoFit
    Next
    Next u

    ActiveSheet.Range("$A$1:$A$152184").AutoFilter Field:=1, Criteria1:="http://www.bursamalaysia.com/market/listed-companies/company-announcements/???????", Operator:=xlAnd

    For k = 1 To [A65536].End(xlUp).Row
    If Rows(k).Hidden = True Then
    Rows(k).EntireRow.Delete
    k = k - 1
    End If
    Next k


    Set IE = Nothing
    Application.StatusBar = ""
    Application.ScreenUpdating = True
End Sub

1 Answer 1

1

Just to get the qualifying hrefs that you mention from the URL given I would use the following. It uses a CSS selector combination to target the URLs of interest from the specified page.

The CSS selector combination is

#bm_ajax_container [href^='/market/listed-companies/company-announcements/']

This is a descendant selector looking for elements with attribute href whose value starts with /market/listed-companies/company-announcements/, and having a parent element with id of bm_ajax_container. That parent element is the ajax container div. The "#" is an id selector and the "[] " indicates an attribute selector. The "^" means starts with.

Example of container div and first matching href:

As more than one element is to be matched the CSS selector combination is applied via querySelectorAll method. This returns a nodeList whose .Length can be traversed to access individual items by index.

The full set of qualifying links are written out to the worksheet.


Example CSS query results from page using selector (sample):

enter image description here


VBA:

Option Explicit
Public Sub GetInfo()
    Dim IE As New InternetExplorer
    Application.ScreenUpdating = False
    With IE
        .Visible = True
        .navigate "http://www.bursamalaysia.com/market/listed-companies/company-announcements/#/?category=SH&sub_category=all&alphabetical=All"

        While .Busy Or .readyState < 4: DoEvents: Wend

        Dim links As Object, i As Long
        Set links = .document.querySelectorAll("#bm_ajax_container [href^='/market/listed-companies/company-announcements/']")
        For i = 0 To links.Length - 1
            With ThisWorkbook.Worksheets("Sheet1")
                .Cells(i + 1, 1) = links.item(i)
            End With
        Next i
        .Quit
    End With
    Application.ScreenUpdating = True
End Sub
Sign up to request clarification or add additional context in comments.

3 Comments

Thank QHarr,it works very well,you have been simplified the whole process.
Glad it helped :-)
hi, may I know how did you get the Example CSS query results from page using selector in the sample?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.