0

I am trying to scrape some data off a shoe website called footlocker.com I have the following code, where I am trying to extract the number of 'xyz' brand shoes on sale and the total number of those shoes.

library(rvest)
webpage <- 
read_html("https://www.footlocker.com/category/brands/adidas.html? 
query=adidas%3Arelevance%3AproductType%3A200005")
webpage

#Using CSS selectors to scrape the sale section
sale_count_html <- html_nodes(webpage, 'li:nth-child(1) .miscellaneous 
.count')
sale_count <- html_text(sale_count_html)
sale_count <- as.numeric(sale_count)
head(sale_count)


total_count_html <- html_nodes(webpage,'strong+ strong')
total_count <- html_text(total_count_html)
head(total_count)

It is giving me character(0) for sale_count whereas on the website it is a 3 digit number. And for total_count, it is giving me a totally different number than what is on the website

2
  • The web page probably loads data via javascript after it activates in the browser. Simple web scraping doesn't run javascript. Maybe you can use something like RSelenium to run that code for you. Commented Sep 18, 2018 at 16:07
  • 1
    What you're actually doing is violating the terms of service — footlocker.com/help/terms-of-use.html — and encouraging others to do so and potentially end up in legal trouble. Commented Sep 18, 2018 at 17:57

1 Answer 1

1

I have been able to extract the product names and product prices with the following code :

library(RSelenium)
library(stringr)
shell('docker run -d -p 4445:4444 selenium/standalone-firefox')
remDr <- remoteDriver(remoteServerAddr = "localhost", port = 4445L, browserName = "firefox")
remDr$open()
remDr$navigate('https://www.footlocker.com/category/brands/adidas.html?query=adidas%3Arelevance%3AproductType%3A200005')

# The four lines below are to remove the pop-up windows 
webElem <- remDr$findElement("id", "bluecoreEmailCaptureSubmit")
webElem$submitElement()
webElem <- remDr$findElement("id", "touAgreeBtn")
webElem$ClickElement()

page_Content <- remDr$getPageSource()[[1]]

# Here, we extract the information related to the shoes with regular expressions
text <- str_extract(page_Content, "<span class=\"ProductName\"(.*)(\\$\\d{1,5}\\.\\d{0,2})")
text_Split <- strsplit(text, split = "<span class=\"ProductName\">")[[1]]
text_Split <- text_Split[-1]

product_Name <- str_extract_all(string = text_Split, pattern = "<span class=\"ProductName-primary\">[^<]*</span>")

pattern_Product_Price <- c("(<span class=\"ProductPrice\"><span>\\$\\d{1,5}\\.\\d{0,2})",
                           "(<span class=\"ProductPrice-final\" aria-hidden=\"true\">\\$\\d{1,5}\\.\\d{0,2})",
                          "(<span class=\"ProductPrice-original\" aria-hidden=\"true\">\\$\\d{1,5}\\.\\d{0,2})")

regex_Product_Price <- paste0(pattern_Product_Price, collapse = "|")
  
product_Price <- str_extract_all(string = text_Split, pattern = regex_Product_Price)

From this information, you can count the number of pairs of shoes.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.