1

This a continuation of a previous post here. In this new post I am trying to capture the contents of the following elements in the HTML code below the following list:

datePosted expected result: "Aug. 18, 2018, 4:19 a.m"

addressCountry expected result: "United States"

addressRegion expected result: "Oklahoma City"

The HTML text is the following:

<div class="container-fluid">
<div itemscope itemtype="http://schema.org/JobPosting">  
      <div class="row content">
        <div class="col-sm-3 sidenav well job_detail_lhs">
            <div class="card">
              <div class="card-body">


                        <strong><a href="/?cmp=jd&from=search-more">< Search 32182 More Oil Jobs </a></strong>

                    <meta itemprop="datePosted" content="Aug. 18, 2018, 4:19 a.m." />
                    <meta itemprop="industry" content="Oil & Gas" />
                    <span itemprop="jobLocation" itemscope itemtype="http://schema.org/Place">
                        <span itemprop="address" itemscope itemtype="http://schema.org/PostalAddress">


                                <h4><strong>Country:</strong></h4>
                                <p itemprop="addressCountry">United States</p>



                                <h4><strong>Location:</strong></h4>
                                <meta itemprop="addressRegion" content="Oklahoma City" />
                                <p itemprop="addressLocality">Oklahoma City</p>

                        </span>
                    </span>
                            <h4><strong>Posted:</strong></h4>
                            <p>22 Days Ago</p>
                    <div>

The web page is: here

...and the code so far is this:

Sub DeepScrap()

Dim IE As New InternetExplorer
Dim Doc As HTMLDocument
Dim sDD As String
Dim i, j, s As Long

s = 5

Sheets("LNK0").Activate

Do Until Cells(s, 1) = ""

    'IE.Visible = True
    IE.navigate Cells(s, 4)
    Do
    DoEvents
    Loop Until IE.readyState = READYSTATE_COMPLETE
    Set Doc = IE.document
    'The first two elements below come from an upper part of the html,
    'I tried different combinations of "getElements" but was not able to 
    'capture them
    htmlTitle = Doc.getElementsByTagName("h1")(0).innerText
    htmlCompany = Doc.getElementsByTagName("h3")(0).getElementsByTagName("span")(0).innerText
    htmlCountry = 'need to figure out how to get
    htmlLoc = 'need to figure out how to get
    htmlPost = 'need to figure out how to get

    Cells(s, 5) = htmlTitle
    Cells(s, 6) = htmlCompany

    s = s + 1
    Doc.Close

Loop

End Sub

I tried several concatenated combinations of getElementsByTagName but I wasn't able to get the expected results.

Thanks in advance for the help!

2
  • There is no link in your post leading to a webpage. Please include it to get any specific solution. Commented Sep 10, 2018 at 2:20
  • Links fixed! Thanks! Commented Sep 10, 2018 at 2:31

2 Answers 2

4

I've used xmlhttp request to make the execution time way faster. .querySelector() are very concise and easy to deal with. I've used the same within the below script to locate elements. Check this out:

Sub GetInfo()
    Const url$ = "https://www.oneoiljobsearch.com/jobs/senior-reservoir-engineer-oklahoma-city-united-states-4/?cmp=js&from=job-search-form-7"
    Dim Http As New XMLHTTP60, Html As New HTMLDocument

    With Http
        .Open "GET", url, False
        .send
        Html.body.innerHTML = .responseText
    End With

    Range("A1") = Html.querySelector("meta[itemprop='datePosted']").getAttribute("content")
    Range("A1").Offset(, 1) = Html.querySelector("p[itemprop='addressCountry']").innerText
    Range("A1").Offset(, 2) = Html.querySelector("meta[itemprop='addressRegion']").getAttribute("content")
End Sub

Reference to add to the library:

Microsoft XML, v6.0
Microsoft HTML Object Library
Sign up to request clarification or add additional context in comments.

6 Comments

You perhaps wanted to get this date Aug. 18, 2018, 4:19 a.m. I've fixed that within the script.
Nice selector use as usual. I have now discovered the API is a company/role within oil and gas and not a useful search term for an application programming interface.
Excellent, I see that this way is much faster! Can you recommend a tutorial on xmlhttp? I believe I will be using it very often! Thanks for your support!
QHarr API are the initials of the American Petroleum Institute, they created a measurement of the oil gravity that is usually called API gravity or just API for short. Have a nice week all you!
Hit this link for the tutorial on xmlhttp request (in it's latter portion) @Pegaso.
|
2

Using CSS selectors and IE

Option Explicit
Public Sub GetInfo()
    Dim ie As New InternetExplorer
    With ie
        .Visible = True
        .navigate "https://www.oneoiljobsearch.com/jobs/senior-reservoir-engineer-oklahoma-city-united-states-4/?cmp=js&from=job-search-form-7"

        While .Busy Or .readyState < 4: DoEvents: Wend

        With .document
           Debug.Print .querySelector("[itemprop=datePosted]").Content
           Debug.Print .querySelector("[itemprop=addressCountry]").innerText
           Debug.Print .querySelector("[itemprop=addressRegion]").Content
        End With

        Stop                                     '<=delete me after
        'other stuff
        .Quit
    End With
End Sub

Same thing with WinHTTP

Option Explicit
Public Sub GetInfo()
    Dim html As New HTMLDocument
    With CreateObject("WinHttp.WinHttpRequest.5.1")
        .Open "GET", "https://www.oneoiljobsearch.com/jobs/senior-reservoir-engineer-oklahoma-city-united-states-4/?cmp=js&from=job-search-form-7", False
        .send
        html.body.innerHTML = .ResponseText
        With html
            Debug.Print .querySelector("[itemprop=datePosted]").Content
            Debug.Print .querySelector("[itemprop=addressCountry]").innerText
            Debug.Print .querySelector("[itemprop=addressRegion]").Content
        End With             
        Stop                                     '<=delete me after
        'other stuff
    End With
End Sub

1 Comment

I got an error '91' at the first line of Debug.Print. I have searched a lot of how to get the META data using XMLHHTP but with no luck

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.