0

Related to the question asked here: R - Using SelectorGadget to grab a dataset

library(rvest)
library(jsonlite)
library(magrittr)
library(stringr)
library(purrr)
library(dplyr)

get_state_index <- function(states, state) {
  return(match(T, map(states, ~ {
    .x$name == state
  })))
}

s <- read_html("https://www.opentable.com/state-of-industry") %>% html_text()
all_data <- jsonlite::parse_json(stringr::str_match(s, "__INITIAL_STATE__ = (.*?\\});w\\.")[, 2])
fullbook <- all_data$covidDataCenter$fullbook

hawaii_dataset <- tibble(
  date = fullbook$headers %>% unlist() %>%  as.Date(),
  yoy = fullbook$states[get_state_index(fullbook$states, "Hawaii")][[1]]$yoy %>% unlist()
)

I am trying to grab the Hawaii dataset from the State tab. The code was working before but now it is throwing an error with this part of the code:

all_data <- jsonlite::parse_json(stringr::str_match(s, "__INITIAL_STATE__ = (.*?\\});w\\.")[, 2])

I am getting the error:

Error: lexical error: invalid char in json text.                                        NA                      (right here) ------^

Any proposed solutions? It seems that the website has remained the same for the year but what type of change is causing the code to break?

EDIT: The solution proposed by @QHarr:

all_data <- jsonlite::parse_json(stringr::str_match(s, "__INITIAL_STATE__ = ([\\s\\S]+\\});")[, 2])

This was working for a while but then it seems that their website again changed the underlying HTML codes.

1 Answer 1

1

Change the regex pattern as shown below to ensure it correctly captures the desired string within the response text i.e. the JavaScript object to use for all_data

all_data <- jsonlite::parse_json(stringr::str_match(s, "__INITIAL_STATE__ = ([\\s\\S]+\\});")[, 2])

enter image description here

Note: in R the single escape is doubled e.g. \\s rather than shown \s above.

Sign up to request clarification or add additional context in comments.

4 Comments

How do I know what regex pattern to change though? They seem to be changing it every now and then. Right now the regex pattern you have posted in the solutions isn't working anymore.
Is it from looking at this: window.__INITIAL_STATE__={"authModal":{"isAuthModalOpen":f...} })(window) </script><script>_otbootstrap();</script><script> window.addEventListener('load', function() { var preloadPaths = JSON.parse("[]") ..........; }) });
I'll have a look on weekend. Please remind me.
Did you get a chance to look at the problem yet? Thanks

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.