1

I want to get the id_product and id_parent from this web page. Yesterday, I could get the results, but when I tried it again today I got an error message. Anyway, I'm doing it from rstudio.cloud.

url <-  paste("https://www.tokopedia.com/zhafranseafood/cumi-asin-1kg-per-pack")

    headers = c('User-Agent' = 'Mozilla/5.0')
    doc <- read_html(httr::GET(url, httr::add_headers(.headers=headers)))%>%
          html_text()
    id_product <- str_match_all(doc,'product_id\\s+=\\s+(\\d+);')[[1]][,2]
    id_parent <- str_match_all(doc,'parent_id\\s+=\\s+(\\d+);')[[1]][,2]

    id_product
    id_parent

Error in curl::curl_fetch_memory(url, handle = handle) : 
  Send failure: Connection was reset

I've been trying to search for the possible explanation but is still to no avail.

1 Answer 1

1

An extra header is required by server

library(httr)
library(stringr)
library(magrittr)

headers = c(
  'User-Agent' = 'Mozilla/5.0',
  'Accept' = 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3'
)

doc <- read_html(httr::GET(url = 'https://www.tokopedia.com/zhafranseafood/cumi-asin-1kg-per-pack', httr::add_headers(.headers=headers)))%>%
       html_text()

id_product <- str_match_all(doc,'product_id\\s+=\\s+(\\d+);')[[1]][,2]
id_parent <- str_match_all(doc,'parent_id\\s+=\\s+(\\d+);')[[1]][,2]

id_product
id_parent
Sign up to request clarification or add additional context in comments.

2 Comments

I still could not do it. It's loading like forever. Anyway, I'm doing it from rstudio.cloud.
I'm unsure what difference that makes. I ran the above from R Studio just fine. Open this url in brower https://www.tokopedia.com/zhafranseafood/cumi-asin-1kg-per-pack and inspect the web-traffic for it. Look at the headers which are used in the initial request. Re-create the entire set first and see what happens. In your case it might turn out that something else is going on but for me it was that the server wanted that second header.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.