2

I am getting started with trying to learn about scraping. I got this page that is behind a login and I remember reading that you should not try to do the (1), (2) or (3) thing after get element by tagname. But that you should rather get something more unique like a Classname or ID. But can someone please tell me why

This the GetTag works and

Dim Companyname As String
Companyname = ie.document.getElementsByTagName("span")(1).innertext

This GetClass do not work

Dim Companyname As String
Companyname = ie.document.getElementsByClassName("account-website-name").innertext

This is the text that I am scraping

<span class="account-website-name" data-journey-name="true">Dwellington Journey</span>
5
  • 2
    Whenever you use something that can potentially get more that a single element you need to provide an ordinal index. You do this in the first example with the (1) which references the second <span> element (ordinals are zero based). In your second example you omit the ordinal so .getElementsByClassName doesn't know which to return, even if there is only one that matches. Commented Dec 22, 2014 at 7:20
  • You are a boss man! Thanks! Commented Dec 22, 2014 at 7:25
  • @Jeeped That sounds more like an answer than a comment, how about fleshing it out, posting it as one and getting the rep you seem to deserve? Commented Dec 22, 2014 at 7:50
  • Thanks @Aiken - I had just gone to meta to read this and try to decide on a course of action. It seems that re-submitting as an Answer Community Wiki is the preferred treatment. Commented Dec 22, 2014 at 7:57
  • @Jeeped I don't see why it has to be a community wiki, you posted the comment which turned out to be an answer, why lose out on your rep? Posting as a community wiki is when someone else answered in the comments but hasn't posted an actual answer. Commented Dec 22, 2014 at 8:24

2 Answers 2

3

getELEMENTbyProperty vs getELEMENTSbyProperty

There are primarily two distinct types of commands to retrieve one or more elements from a web page's .Document; those that return a single object and those that return a collection of objects.

Getting an ELEMENT

When getElementById is used, you are asking for a single object (e.g. MSHTML.IHTMLElement). In this case the properties (e.g. .Value, .innerText, .outerHtml, etc) can be retrieved directly. There isn't supposed to be more than a single unique id property within an HTML body so this function should safely return the only element within the i.e.document that matches.

'typical VBA use of getElementById
Dim CompanyName As String
CompanyName = ie.document.getElementById("CompanyID").innerText

Caveat: I've noticed a growing number of web designers who seem to think that using the same id for multiple elements is oh-key-doh-key as long as the id's are within different parent elements like different <div> elements. AFAIK, this is patently wrong but seems to be a growing practise. Be careful on what is returned when using .getElementById.

Getting ELEMENTS

When using getElementsByTagName, getElementsByClassName, etc. where the word Elements is plural, you are returning a collection (e.g. MSHTML.IHTMLElementCollection) of objects, even if that collection contains only one or even none. If you want to use these to directly access an property of one of the elements within the collection, an ordinal index number must be supplied so that a single element within the collection is referenced. The index number within these collections is zero based (i.e. the first starts at (0)).

'retrieve the text from the third <span> element on a webpage
Dim CompanyName As String
CompanyName = ie.document.getElementsByTagName("span")(2).innerText

'output all <span> classnames to the Immediate window until the right one comes along
'retrieve the text from the first <span> element with a classname of 'account-website-name'
Dim e as long, es as long
es = ie.document.getElementsByTagName("span").Length - 1
For e = 0 To es
    Debug.Print ie.document.getElementsByTagName("span")(e).className
    If ie.document.getElementsByTagName("span")(e).className = "account-website-name" Then
        CompanyName = ie.document.getElementsByTagName("span")(e).innerText
        Exit For
    End If
Next e

'same thing, different method
Dim eSPN as MSHTML.IHTMLElement, ecSPNs as MSHTML.IHTMLElementCollection
ecSPNs = ie.document.getElementsByTagName("span")
For Each eSPN in ecSPNs
    Debug.Print eSPN.className
    If eSPN.className = "account-website-name" Then
        CompanyName = eSPN.innerText
        Exit For
    End If
Next eSPN 
Set eSPN = Nothing: Set ecSPNs = Nothing

To summarize, if your Internet.Explorer method uses Elements (plural) rather than Element (singular), you are returning a collection which must have an index number appended if you wish to treat one of the elements within the collection as a single element.

Sign up to request clarification or add additional context in comments.

1 Comment

And... that was just impressive. Thanks man! And I agree, it feels like the same ID thing is spreading like a wild fire. After this I actually got 4 different number with both same ID and classname but under different parents. Got all that figures out. But thank you again. A single S makes a lot of difference.
1

CSS selector:

You can achieve the same thing with a CSS selector of .account-website-name

The "." means className. This will return a collection of matching elements if there are more than one.


CSS query:

CSS query


VBA:

You apply the selector with the .querySelectorAll method of .document. This returns a nodeList which you traverse the .Length of, accessing items by index, starting from 0.

Dim aNodeList As Object, i As Long
Set aNodeList = ie.document.querySelectorAll(".account-website-name")

For i = 0 To aNodeList.Length -1
    Debug.Print aNodeList.Item(i).innerText
    ' Debug.Print aNodeList(i).innerText ''<== sometimes this syntax instead
Next

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.