1

Hi I am trying to read data of the World's powerfl brands from the link "http://www.forbes.com/powerful-brands/list/3/#tab:rank" into a data fame using R

I am a beginner so I tried using the following code to retrieve the data

  library(XML)
  library(RCurl)
  # Read and parse HTML file
  forbe = 'http://www.forbes.com/powerful-brands/list/#tab:rank'

  data <- getURL('http://www.forbes.com/powerful-brands/list/#tab:rank')
  data
  htmldata <- readHTMLTable(data)
  htmldata 

Could anyone please help me in retrieving data from the webpage mentioned

2 Answers 2

1

They use XHR requests to populate the page via javascript. Use browser Developer Tools to see the Network requests

enter image description here

and grab the JSON directly:

brands <- jsonlite::fromJSON("http://www.forbes.com/ajax/list/data?year=2015&uri=powerful-brands&type=organization")
str(brands)

## 'data.frame':    100 obs. of  10 variables:
##  $ position          : int  12 44 83 87 13 22 1 39 16 72 ...
##  $ rank              : int  12 44 83 87 13 22 1 39 16 72 ...
##  $ name              : chr  "AT&T" "Accenture" "Adidas" "Allianz" ...
##  $ uri               : chr  "att" "accenture" "adidas" "allianz" ...
##  $ imageUri          : chr  "att" "accenture" "adidas" "allianz" ...
##  $ industry          : chr  "Telecom" "Business Services" "Apparel" "Financial Services" ...
##  $ revenue           : num  132400 32800 14900 131600 87500 ...
##  $ oneYearValueChange: int  17 14 -14 -6 32 13 17 1 -5 -1 ...
##  $ brandValue        : num  29100 12000 6800 6600 28100 ...
##  $ advertising       : num  3272 88 NA NA 3300 ...
Sign up to request clarification or add additional context in comments.

5 Comments

is this the reason when I used getURL or download.file , I was also not reading the data of the whole page ? Also how did you determine that it is a Json data format ?
Yep. RSelenium and/or phantomjs are other ways besides this. Click on "Preview" to see JSON but also the Content-Type MIME type gives away the fact that it's JSON.
Is this the only way we can access this data, some one told me to download the whole htmlpage and use htmltreeparse, I could not figure that out, could you please help shed some light on it. I mean how can I access the data differently if I had not known that it was in Json and was being retrieved by some other link.I would really appreciate your help
As I said. RSelenium (which drives and scrapes an interactive web browser) or a phantomjs script are your other options. You'll need to do some actual research into those technologies. I just recently answered an SO web scraping question demonstrating RSelenium.
@hrbmstr could you please help in the stackoverflow.com/questions/34686048/…
0

Why don't you try something like this. Basically, doing something like:

download.file(forbe, htmldata, auto, quiet = FALSE, cacheOK = TRUE)

And the read data should be in the htmldata array variable.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.