0

I am looking for a way to get the content of an element named "data-testid" from a website. This element exists about 35 times in different contexts with different content in the HTML code. The one that I am looking for goes like [data-testid="############-follow"], where ######## is a changing number. I am using Excel VBA with Selenium in order to work with the Chrome browser. The code is relatively simple and is mostly working, but I can't get this particular content. I open a webpage, look for elements with this name, and then scan all found elements if they contain the word "follow". Once found, I would extract the number before this word and store it in an Excel worksheet.

Set d = New ChromeDriver
d.Start "Chrome"

Set Rng = Range(Worksheets("followers").Range("A2"), Worksheets("followers").Range("A2").End(xlDown))

For Each Cell In Rng
    If Cells(Cell.Row, 2).Value2 = "" Then
        user = Cell.Value2
        user = Replace(user, "@", "", 1, 1)         'remove "@"
        d.Get "https://twitter.com/" & user         'navigate to user's page.
        Set Result = d.FindElementsByXPath("//div[@data-testid]")
        If Result.Count > 0 Then
            For i = 1 To Result.Count
                n = InStr(Result(i).Text, "-follow")
                If n > 0 Then Exit For
            Next
            Cells(Cell.Row, 2).Value2 = Left(Result(i).Text, n - 1)
        End If
    End If
Next

This is the part of the HTML containing the desired element at the end:

<div role="button" data-focusable="true" tabindex="0" class="css-18t94o4 css-1dbjc4n r-1niwhzg r-p1n3y5 r-sdzlij r-1phboty r-rs99b7 r-1w2pmg r-1vuscfd r-1dhvaqw r-1ny4l3l r-1fneopy r-o7ynqc r-6416eg r-lrvibr" data-testid="1197328651785789440-follow">

Each item of the result [Result(1...35)] lists 4 boolean properties and one string type when I inspect it, the string is invariably the tag name "div". There is no other property shown. By chance, I tried the property "Text" [Result(i).Text] and it gives some text from the page, but none of the 35 elements shows the expected content.

As I have little experience with Selenium I would need help to understand how I can extract the content of this element Thanks

2 Answers 2

2

Use a css attribute selector with ends with operator

.FindElementByCss("[data-testid$='-follow']")

I’ve written in several languages with slight differences. If the above isn’t the exact spelling of the method for selenium basic, you can instantiate a webdriver instance and it should show you the correct spelling if the above is slightly off.

This targets the node by its attribute and attribute value and therefore no loop is needed.

Use the .attribute("data-testid") property, on the matched node, to access the attribute value.

Sign up to request clarification or add additional context in comments.

9 Comments

You can add div in front of [ to be more specific or add more attributes but hopefully the above is specific enough.
Is it within an iframe or obscured by another element? Is it found if you put a wait before that line?
Thank you for your hint. It finds exactly 1 element, but Result(1).Text = "Seguir", which is not the content of this element, but the text string of the button, found lower in this node. It is more precise than my previous search (1 result instead of 35). So the question now is, is there any (hidden) property of Result(1) that gives me the content of this element? Usually, I can see the property tree of an object by inspecting it in the VBA debugger, but not even "Text" is listed, although it exists obviously. Any idea of other properties? "Content" doesn't work.
Findelement should match the first node. If you use the find box Ctrl + F in browser elements tab (F12), and enter the css selector, the bit between “” , how many matches does it return when you hit enter? There are examples of how to do this listed on my profile page under links I commonly share section.
It is my understanding that "Result" is not precisely the element that I searched for, but the entire node. This would explain that one of its properties is "tagname" which is given as "div" and that the "Text" property returns the string ("Seguir") that is found further down in this node. So what I need is to address the very element that I was looking for, or get the entire code of this node, so I can search for the substring "follow".
|
0

Thanks to QHarr I have now found the perfect solution. I hope, this may help others, too, so I describe it here. First, I changed the search to .FindElementsByCss("[data-testid$='-follow']"). This gives exactly ONE result (instead of the previous 35) due to the better specification of the searched element. The "$" sign after the element's name specifies that the following argument is a partial string, so it finds any "data-testid" whose argument contains "-follow". There is only one in this document. Then I changed FindElements to FindElement, as there is only one left now. Then I added ".Attribute("data-testid")" to the search:

a = d.FindElementByCss("[data-testid$='follow']").Attribute("data-testid")

The result is not the object "Result" anymore, but exactly the string which I was looking for. The rest was easy. Thank you very much QHarr!

6 Comments

$ is the ends with operator. Means substring to be matched must be at end of attribute value.
Ok, thanks. I misunderstood it. Where do I find info about these operators?
See under attribute selectors
On GitHub you can download the examples excel file that goes with selenium basic. The issues log of the GitHub pages are also a good source. If you have included a reference to selenium type library you should also have intellisense with early bound webdriver instance. Any questions feel free to ask.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.