Web Scraping in R using getURL

Question

Hi I am trying to read data of the World's powerfl brands from the link "http://www.forbes.com/powerful-brands/list/3/#tab:rank" into a data fame using R

I am a beginner so I tried using the following code to retrieve the data

  library(XML)
  library(RCurl)
  # Read and parse HTML file
  forbe = 'http://www.forbes.com/powerful-brands/list/#tab:rank'

  data <- getURL('http://www.forbes.com/powerful-brands/list/#tab:rank')
  data
  htmldata <- readHTMLTable(data)
  htmldata

Could anyone please help me in retrieving data from the webpage mentioned

hrbrmstr · Accepted Answer · 2016-01-06 19:55:16Z

1

They use XHR requests to populate the page via javascript. Use browser Developer Tools to see the Network requests

and grab the JSON directly:

brands <- jsonlite::fromJSON("http://www.forbes.com/ajax/list/data?year=2015&uri=powerful-brands&type=organization")
str(brands)

## 'data.frame':    100 obs. of  10 variables:
##  $ position          : int  12 44 83 87 13 22 1 39 16 72 ...
##  $ rank              : int  12 44 83 87 13 22 1 39 16 72 ...
##  $ name              : chr  "AT&T" "Accenture" "Adidas" "Allianz" ...
##  $ uri               : chr  "att" "accenture" "adidas" "allianz" ...
##  $ imageUri          : chr  "att" "accenture" "adidas" "allianz" ...
##  $ industry          : chr  "Telecom" "Business Services" "Apparel" "Financial Services" ...
##  $ revenue           : num  132400 32800 14900 131600 87500 ...
##  $ oneYearValueChange: int  17 14 -14 -6 32 13 17 1 -5 -1 ...
##  $ brandValue        : num  29100 12000 6800 6600 28100 ...
##  $ advertising       : num  3272 88 NA NA 3300 ...

answered Jan 6, 2016 at 19:55

hrbrmstr

79.1k11 gold badges146 silver badges209 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

av abhishiek Over a year ago

is this the reason when I used getURL or download.file , I was also not reading the data of the whole page ? Also how did you determine that it is a Json data format ?

hrbrmstr Over a year ago

Yep. RSelenium and/or phantomjs are other ways besides this. Click on "Preview" to see JSON but also the Content-Type MIME type gives away the fact that it's JSON.

av abhishiek Over a year ago

Is this the only way we can access this data, some one told me to download the whole htmlpage and use htmltreeparse, I could not figure that out, could you please help shed some light on it. I mean how can I access the data differently if I had not known that it was in Json and was being retrieved by some other link.I would really appreciate your help

hrbrmstr Over a year ago

As I said. RSelenium (which drives and scrapes an interactive web browser) or a phantomjs script are your other options. You'll need to do some actual research into those technologies. I just recently answered an SO web scraping question demonstrating RSelenium.

av abhishiek Over a year ago

@hrbmstr could you please help in the stackoverflow.com/questions/34686048/…

perror · Accepted Answer · 2016-01-06 19:31:07Z

0

Why don't you try something like this. Basically, doing something like:

download.file(forbe, htmldata, auto, quiet = FALSE, cacheOK = TRUE)

And the read data should be in the htmldata array variable.

edited Jan 6, 2016 at 19:31

perror

7,52616 gold badges63 silver badges89 bronze badges

answered Jan 6, 2016 at 19:13

Polhek

771 silver badge7 bronze badges

Collectives™ on Stack Overflow

Web Scraping in R using getURL

2 Answers 2

5 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

5 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related