0

So I will start by saying that I am very new to VBA. I am trying to extract data from the table on this page. I haven't acomplished much as far as the code goes so take it easy on me. I am looking for some direction on how to approach it and if it can be done, which I believe it can. If anyone is able to help guide me in the right direction that would be much appreciated.

Sub rgnbateamstats()

Dim appIE As Object
Set appIE = CreateObject("internetexplorer.application")


With appIE
.navigate "https://rotogrinders.com/team-stats/nba-earned?site=draftkings"
.Visible = True
End With

Do While appIE.Busy
DoEvents
Loop

Set allRowOfData = appIE.document.getElementById("proj-stats")

Not really sure where to go from here or if I'm even on the right track.

2
  • You should take a look at my post why using ieObj.Busy alone is a bad idea. Commented Oct 27, 2018 at 5:12
  • Nice, I will add that in. Thanks! Commented Oct 27, 2018 at 5:19

3 Answers 3

4

This will grab the entire table on that page.

This project uses early-binding. You will need to set references to:

  • Microsoft Internet Controls
  • Microsoft HTML Object Library

You can accomplish this within the VBE > Tools > References.

I will say, this site uses a very strange method on setting up their tables, and it was interesting to figure out a decent way to accomplish this.

Also, another thing that you may or may not be okay with is that there are hidden columns in this table that doesn't show on the site but will show in your excel document. If you are not okay with this, you can simply remove or hide them after this code is executed - or if you're up to modifying this to prevent that from happening during execution, more power to you.

Option Explicit

Sub rgnbateamstats()

    Const url$ = "https://rotogrinders.com/team-stats/nba-earned?site=draftkings"

    Dim IE As New InternetExplorer, doc As HTMLDocument
    Dim ws As Worksheet
    Set ws = ThisWorkbook.Worksheets(1)
    
    With IE
        .Navigate url
        .Visible = True
        ieBusy IE
        Set doc = .Document
    End With
    
    Dim r As Long, c As Long, tCol As HTMLDivELement
    Dim subTbls(): subTbls = Array("rgt-bdy left", "rgt-bdy mid", "rgt-bdy right")
    
    Dim subTbl As Long        
    For subTbl = 0 To 2
        For Each tCol In getSubTblCols(doc, subTbls(subTbl)).getElementsByClassName("rgt-col")
            c = c + 1
            For r = 1 To tCol.getElementsByTagName("div").Length
                ws.Cells(r, c) = tCol.getElementsByTagName("div")(r - 1).innerText
            Next
        Next tCol
    Next subTbl

End Sub
Private Function getSubTblCols(doc As HTMLDocument, ByVal className$) As HTMLDivElement
    Dim tbl As HTMLTable
    Set tbl = doc.getElementById("proj-stats")
    Set getSubTblCols = tbl.getElementsByClassName(className)(0).Children(0). _
            Children(1)
End Function
Private Sub ieBusy(ieObj As InternetExplorer)
    With ieObj
        Do While .Busy Or .ReadyState < READYSTATE_COMPLETE
            DoEvents
        Loop
    End With
End Sub

Okay, time to attempt what's going on here.

There are three sub-tables in your table. This is the best way to explain it, but this means you will first loop through each sub table with this line:

For subTbl = 0 To 2

Within that loop, you will loop that sub-table's columns with this line:

For Each tCol In getSubTblCols(doc, subTbls(subTbl)).getElementsByClassName("rgt-col")

rgt-col is the class name for the column in each table - so at least that part was easy. The function getSubTblCols grabs the main sub-table element class name of one of the three names of the sub table in the array subTbls().

c is your Excel column number, r is the row number. You also use r for each of the HTML's row numbers, but it's uses base 0, so you have to subtract 1.

Then get the cell's value using the innerText property of the cell, place that into your spreadsheet, then rinse and repeat.

I moved your busy webpage function to a new sub, ieBusy. I also added the .readyState property because as I stated in my comment that .busy by itself is unreliable at best.

Sign up to request clarification or add additional context in comments.

1 Comment

Edit - Thanks! This seems to work great. I missed the references in the yellow box. Whoops.
3

Whilst the table layout may appear a little odd, it actually just needs a 180 in thinking. You can grab the columns by the class name and then simply loop the rows; rather than the usual looping of rows then columns.

I use a CSS class selector to grab the columns with querySelectorAll i.e. I target the columns by their class name. This returns a nodeList containing each of the columns. Below is an example of the first two columns (admittedly abbr is not visible). Each row within a column is in a div so if I loop the columns I get the rows within each column by grabbing the associated div tag collection. Then I simply loop those to write out.

enter image description here

Option Explicit
Public Sub GetInfo()
    Dim IE As New InternetExplorer, iColumns As Object, iRow As Object, i As Long, j As Long, r As Long, c As Long
    Application.ScreenUpdating = False
    With IE
        .Visible = True
        .navigate "https://rotogrinders.com/team-stats/nba-earned?site=draftkings"

        While .Busy Or .readyState < 4: DoEvents: Wend

        Set iColumns = .document.querySelectorAll(".rgt-col")

        With ThisWorkbook.Worksheets("Sheet1")
            For i = 0 To iColumns.Length - 1
                c = c + 1: r = 0
                Set iRow = iColumns.item(i).getElementsByTagName("div")
                For j = 0 To iRow.Length - 1
                    r = r + 1
                    .Cells(r, c) = iRow(j).innerText
                Next
            Next
        End With
        Application.ScreenUpdating = True
        .Quit
    End With
End Sub

References:

VBA > Tools > References > Microsoft Internet Controls

Or change to late bound with:

Dim ie As Object
Set ie = CreateObject("InternetExplorer.Application")

6 Comments

Tbh I always look forward to seeing your scraping methods. Nicely done.
So I tried this but keep getting a "Compile Error" on "Dim IE As New InternetExplorer" - User-defined type not defined. How do I overcome this?
You need to go VBA > Tools > References and add a reference to Microsoft Internet Controls.
Yep! I missed that part. Tried to edit my comments but it wouldn't let me. Works flawlessly!
No worries. I should have added it. I edited reference in :-)
|
0

Try this part for extracting first column

Set allrowofdata = appIE.document.getElementById("proj-stats")

Set newobj = allrowofdata.getElementsByClassName("rgt-col")(0)

For Each x In newobj.Children
r = r + 1
Cells(r, 1).value = x.innerText
Next x

1 Comment

Adding this worked for the first column! That was exciting to see it start working! Now to figure out how to modify to pull the rest. Thanks for the head start!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.