1

I am scraping the following website: https://www.banorte.com/wps/portal/ixe/Home/indicadores/tipo-de-cambio

I am trying to get the table of currency exchange rates into an R data frame via the rvest package, but the table itself is configured in a JavaScript variable within the HTML code.

I located the relevant css selector and now I have this:

library(rvest)    
banorte <- "https://www.banorte.com/wps/portal/ixe/Home/indicadores/tipo-de-cambio/" %>%
      read_html() %>%
      html_nodes('#indicadores_financieros_wrapper > script:nth-child(2)')

my output is now the following JavaScript script, as an XML nodeset:

<script>
$(document).ready(function(){
    var valor = '{"tablaDivisas":[{"nombreDivisas":"FRANCO SUIZO","compra":"18.60","venta":"19.45"}, {"nombreDivisas":"LIBRA ESTERLINA","compra":"24.20","venta":"25.15"}, {"nombreDivisas":"YEN JAPONES","compra":"0.1635","venta":"0.171"}, {"nombreDivisas":"CORONA SUECA","compra":"2.15","venta":"2.45"}, {"nombreDivisas":"DOLAR CANADA","compra":"14.50","venta":"15.35"}, {"nombreDivisas":"EURO","compra":"21.75","venta":"22.60"}], "tablaDolar":[{"nombreDolar":"VENTANILLA","compra":"17.73","venta":"19.15"}]}';
    if(valor != '{}'){
        var objJSON = eval("(" + valor + ")");
        var tabla="<tbody>";
        for ( var i = 0; i < objJSON["tablaDolar"].length; i++) {
            tabla+= "<tr>";
            tabla+= "<td>" + objJSON["tablaDolar"][i].nombreDolar + "</td>";
            tabla+= "<td>$" + objJSON["tablaDolar"][i].compra + "</td>";
            tabla+= "<td>$" + objJSON["tablaDolar"][i].venta + "</td>";
            tabla+= "</tr>";
        }
        tabla+= "</tbody>";
        $("#tablaDolar").append(tabla);
        var tabla2="";
        for ( var i = 0; i < objJSON["tablaDivisas"].length; i++) {
            tabla2+= "<tr>";
            tabla2+= "<td>" + objJSON["tablaDivisas"][i].nombreDivisas + "</td>";
            tabla2+= "<td>$" + objJSON["tablaDivisas"][i].compra + "</td>";
            tabla2+= "<td>$" + objJSON["tablaDivisas"][i].venta + "</td>";
            tabla2+= "</tr>";
        }
        tabla2+= "</tbody>";
        $("#tablaDivisas").append(tabla2);
    }
    bmnIndicadoresResponsivoInstance.cloneResponsive(0);
});
</script>

My question is, how do I remove almost everything (all the JavaScript functions/operators) to get only this data with the intention of converting it eventually to a JSON table like this:

{"tablaDivisas":[{"nombreDivisas":"FRANCO SUIZO","compra":"18.60","venta":"19.45"},
{"nombreDivisas":"LIBRA ESTERLINA","compra":"24.20","venta":"25.15"},
{"nombreDivisas":"YEN JAPONES","compra":"0.1635","venta":"0.171"}, 
{"nombreDivisas":"CORONA SUECA","compra":"2.15","venta":"2.45"}, 
{"nombreDivisas":"DOLAR CANADA","compra":"14.50","venta":"15.35"}, 
{"nombreDivisas":"EURO","compra":"21.75","venta":"22.60"}],
"tablaDolar":[{"nombreDolar":"VENTANILLA","compra":"17.73","venta":"19.15"}]}

In other words, I need to extract the "valor" variable from the JS script using R.

For some reason I've had trouble getting this done all within R (without having to export the variable as an external .txt file and then using a substring)

2 Answers 2

2

You could do this:

library(rvest)    
banorte <- "https://www.banorte.com/wps/portal/ixe/Home/indicadores/tipo-de-cambio/" %>%
    read_html() %>%
    html_nodes('#indicadores_financieros_wrapper > script:nth-child(2)') %>%
    as_list()

banorte_vec <- strsplit(banorte[[c(1,1)]],"\r\n")[[1]]
valor <- grep("valor = ", banorte_vec, value = T)
valor <- gsub("\tvar valor = ","",valor)
valor <- gsub("';$","",valor)
valor <- gsub("^'","",valor)

library(jsonlite)
result <- fromJSON(valor)
result

$tablaDivisas
  nombreDivisas compra venta
1    FRANCO SUIZO  18.60 19.45
2 LIBRA ESTERLINA  24.20 25.15
3     YEN JAPONES 0.1635 0.171
4    CORONA SUECA   2.15  2.45
5    DOLAR CANADA  14.50 15.35
6            EURO  21.75 22.60

$tablaDolar
  nombreDolar compra venta
1  VENTANILLA  17.73 19.15
Sign up to request clarification or add additional context in comments.

Comments

1

Definitely a bit more heavyweight answer but generalizes to other, more gnarly "javascript problems".

library(rvest)
library(stringi)
library(V8)
library(tidyverse)

banorte <- "https://www.banorte.com/wps/portal/ixe/Home/indicadores/tipo-de-cambio/" %>%
      read_html() %>%
      html_nodes('#indicadores_financieros_wrapper > script:nth-child(2)')

We'll setup a javascript V8 context:

ctx <- v8()

Then:

  • get the <script> content
  • split it into lines
  • get it into a plain character vector
  • remove the cruft
  • evaluate the javascript

which is not too bad:

html_text(banorte) %>% 
  stri_split_lines() %>% 
  flatten_chr() %>% 
  keep(stri_detect_regex, "^\tvar") %>% 
  ctx$eval()

Since that javascript is a JSON string, we do the eval in R vs V8:

jsonlite::fromJSON(ctx$get("valor"))
## $tablaDivisas
##     nombreDivisas compra venta
## 1    FRANCO SUIZO  18.60 19.45
## 2 LIBRA ESTERLINA  24.20 25.15
## 3     YEN JAPONES 0.1635 0.171
## 4    CORONA SUECA   2.15  2.45
## 5    DOLAR CANADA  14.50 15.35
## 6            EURO  21.75 22.60
## 
## $tablaDolar
##   nombreDolar compra venta
## 1  VENTANILLA  17.73 19.15

If there had been other, useful processing in javascript, this generalizes better.

NOTE: Google translate in my Chrome beta channel was not translating the site well but I think you're awfully close to being in violation of the spirit of item 6 on the "Términos Legales" page but until I can translate it I can't fully tell. When/if I can and it seems like you are I'll delete this.

1 Comment

I tried with but I just end up with an empty global environment. Javascript drives me nuts!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.