0

I am having difficulty scraping seller prices from a wine price listing website. I only get the first result, but not the ones that follow.

My current loop will return the first price each page and then move onto the next page as defined by my urls list. This is the what I have thus far:

#clean.df$srchwrds contains 1,000+ search phrases which I've pre-defined.

urls = lapply(clean.df$srchwrds, . %>% paste("http://www.wine-searcher.com/find/",.,"/", sep = ""))

out = pblapply(urls, function(x) {
    print(x)
    page = read_html(x)
    temp = page %>% html_nodes('.offer_price')
    out = temp
    return (out)
})

For example, you'll notice there are multiple results (sellers) on this URL: http://www.wine-searcher.com/find/chateau+petrus+chateau+petrus+2014

My script will scrape the first seller's price it comes across and ignore the rest. After returning the price of the first seller, it will move on to the next URL as defined in the urls list.

I want it to return all the prices per page before moving on.

Thanks ahead of time for your help!

1 Answer 1

1

There are multiple ways to achieve what you want, depending on the ouput needed.

In the first example returns a list with a character vector for each page. It's similar at what you were doing (btw, no idea what pblapply is)

library(rvest)

## Loading required package: xml2

baseUrl <- 'http://www.wine-searcher.com/find/'

srchwrds <- c("chateau+petrus+chateau+petrus+2014", 
              "chateau+petrus+chateau+petrus+2015")


result <- sapply(srchwrds, function(x) {
    paste0(baseUrl, x) %>% 
        read_html() %>% 
        html_nodes('.offer_price') %>% 
        html_attr('content') 
})
result 

## $`chateau+petrus+chateau+petrus+2014`
##  [1] "1618.35" "1622.06" "1622.98" "1676.47" "1800.00" "1854.83" "2133.06"
##  [8] "3385.08" "4542.50" "9517.24" "9517.24"
## 
## $`chateau+petrus+chateau+petrus+2015`
##  [1] "2264.71"  "2499.40"  "2500.00"  "2550.40"  "2577.59"  "2735.89" 
##  [7] "2777.62"  "2782.25"  "2840.00"  "5096.17"  "10665.32" "21098.79"

This second example use purrr to produce a nicer data.frame. I added the seller name as a bonus.

library(purrr)

result <- map_df(srchwrds, ~{

    paste0(baseUrl, .x) %>% 
        read_html() %>% 
        html_nodes('[itemprop="offers"]') -> tmp
    price <- tmp %>% 
        html_nodes('.offer_price') %>% 
        html_attr('content') 
    seller <- tmp %>% 
        html_nodes('.seller-link-wrap') %>% 
        html_text() %>% 
        gsub('\n', '', ., fixed = T)
    data.frame( seller = seller, price = price, stringsAsFactors = F)
})

result

##                             seller    price
## 1            JJ Buckley Fine Wines  1618.35
## 2               K&L Wine Merchants  1622.06
## 3                Morrell & Company  1622.98
## 4  Weinemotionen - KK Handels GmbH  1676.47
## 5                 Vins Grands Crus  1800.00
## 6     Zachys Wine and Liquor, Inc.  1854.83
## 7                             Arvi  2133.06
## 8                Morrell & Company  3385.08
## 9                      Cellar & Co  4542.50
## 10                Vinum Fine Wines  9517.24
## 11                Vinum Fine Wines  9517.24
## 12                Bacchus-Vinothek  2264.71
## 13           JJ Buckley Fine Wines  2499.40
## 14                Vins Grands Crus  2500.00
## 15               Morrell & Company  2550.40
## 16                Vinum Fine Wines  2577.59
## 17        Fine Wines International  2735.89
## 18                  Sherry-Lehmann  2777.62
## 19              K&L Wine Merchants  2782.25
## 20                      FinestWine  2840.00
## 21           JJ Buckley Fine Wines  5096.17
## 22               Morrell & Company 10665.32
## 23           JJ Buckley Fine Wines 21098.79
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.