Scraping Javascript with rvest

Question

I have been trying to scrape the polling time series from a website that is saved as javascript. So far, I end up with an empty list once selecting the "circle" nodes. Code below, any point much appreciated.

library(rvest)
library(V8)

url = 'https://www.politico.eu/europe-poll-of-polls/belgium/'

dta = read_html(url) %>% 
  html_node('svg') %>% 
  html_node('g') %>% 
  html_node('circle')

QHarr · Accepted Answer · 2020-04-03 05:17:20Z

1

It's actually very easy. Data comes from a json endpoint you can find in the network tab

library(jsonlite)

data <- jsonlite::read_json('https://www.politico.eu/wp-json/politico/v1/poll-of-polls/BE-parliament')
info <- data$polls

You could convert to a dataframe. For example:

library(purrr)

df <- map_df(info, function(x) {

  data.frame(date = x$`date`,
             party = x$parties,
             stringsAsFactors=FALSE)
})

names(df) <- gsub( 'party.','',names(df))

You can always transpose and do any other transforms etc you want e.g.

df <- t(df)

edited Apr 3, 2020 at 5:17

answered Apr 3, 2020 at 4:43

QHarr

84.5k14 gold badges58 silver badges105 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Scraping Javascript with rvest

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related