0

I need to extract posts from a forum webpage using VBA. From code snippets on the web I have got as far as successfully extracting the text for each post. But I can't get at the poster's name. The code I'm using is:

Sub extract_forum_posts()
Dim htm As Object: Set htm = CreateObject("htmlfile")
    With CreateObject("msxml2.xmlhttp")
        .Open "GET", "http://community.betfair.com/your_competitions/go/thread/view/94214/30587277/division-1-thursday-17-september?sdb=1&pg=last#546245865", False
        .send
        htm.body.innerhtml = .responseText
    End With

    For Each div In htm.getElementsByTagName("div")

        If div.classname Like "*flvPostContent*" Then
            Debug.Print div.innertext
        End If
    Next div
End Sub

The poster's name seems to be part of a span element. Don't know what that is.

2
  • Please don't use VBA for this, you can use virtually any other language on the planet (php, python, ruby, javascript, go) just not VBA - you will got mad. Commented Sep 19, 2015 at 8:23
  • Unfortunately I am using this in excel and don't have an alternative. And I'm too old to start learning completely new languages that I'm unlikely to use much! Commented Sep 19, 2015 at 9:03

2 Answers 2

2

Each posted message seems to be encompassed in a div element with a .classname starting with flvPost flvPost. There are div children in that posting div element that comprise different aspects of the post. The username information is nested into anchor and span elements within a child div with a classname of flvPostInfo.

For Each div In htm.getelementsbytagname("div")
    If div.classname Like "flvPost flvPost*" Then
        For d = 0 To div.getelementsbytagname("div").Length - 1
            Select Case div.getelementsbytagname("div")(d).classname
                Case "flvPostInfo"
                    Debug.Print "user: " & div.getelementsbytagname("div")(d).innertext
                Case "flvPostContent"
                    Debug.Print "mssg: " & div.getelementsbytagname("div")(d).innertext
            End Select
        Next d
        'Exit For  'shorten the scrape for testing purposes
    End If
Next div
Sign up to request clarification or add additional context in comments.

5 Comments

@John Fowler - If you are looking to learn by example, try taking on a few of the questions posed in the xmlhttp and xmlhttprequest forums. Ignore the ones that don't provide a public URL to actually test your solutions.
Thanks Jeeped that seems to get me most of the way there. I was trying to extract the "title" part in:
/a> <span class="flvPostInfoName"> <a href="community.betfair.com/dogdayafternoon" class="flvPostInfoNameLink" id="flvPostInfoNameLink-546240333" title="dogdayafternoon" data-user_id="77413318" >dogdayafternoon</a> </span>
I know I can extract the string I want from the flvPostContent but a more elegant solution would be to get the title bit.
The structure of that page seems fairly predictable albeit a little messy. There are functions to get the parent, siblings or children that will work you up, down or sideways through the HTML hierarchy.
0

CSS selector:

You can use a CSS selector of a[id^='flvPostInfoNameLink']

This says a tag elements with id attribute that starts with 'flvPostInfoNameLink'


CSS query (Sample matches):

Sample matches


VBA:

Use the querySelectorAll method of document to return a nodeList of all matching elements. You then loop the .Length to retrieve elements.

Dim aNodeList As Object, iNode As Long
Set aNodeList = ie.document.querySelectorAll("a[id^='flvPostInfoNameLink']")
For iNode = 0 To aNodeList.Length - 1
    Debug.Print aNodeList.item(iNode).innerText
    'Debug.Print aNodeList(iNode).innerText '<== Sometimes this syntax
Next iNode

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.