-2

I am using the following code. It successfully targets the correct url and node text. However, the data that is returned is incomplete as some of the fields (like previous close and open) are blank or failed to download

library(rvest)
library(httr)
library(xml2)

ticker <- "IVV"
url <- paste0("https://finance.yahoo.com/quote/",ticker, "/")
browser_ua <- "Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/534.20 (KHTML, like Gecko) Chrome/11.0.672.2 Safari/534.20"
head <- c("Accept" = "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
            "Accept-Language" = "en-US,en;q=0.9")
    
html_page <- session(
    url,
    user_agent(browser_ua),
    add_headers(head))
    
node_txt <-"yf-1b7pzha"   # old node was "yf-tx3nkj"                                                  
  
temp <- html_page %>% 
    read_html() %>%
    xml_find_all("//li[contains(@class, node_txt)]/span/text()")

The code uses this url: https://finance.yahoo.com/quote/IVV/ What code change is required to grab all values in the first table under the graph?

5
  • 3
    Note that scraping data like this is likely a violation of Yahoo finance's terms of service, see section 2.d.ix Commented Jul 30 at 18:37
  • 2
    There is an R package yahoofinanceR. On ToS: You should refer to Yahoo!’s terms of use (here, here) for details on your rights to use the actual data downloaded. Remember - the Yahoo! finance API is intended for personal use only. Commented Jul 30 at 19:05
  • Does anyone right code? Commented Jul 30 at 19:06
  • 2
    It's "write" . And yes, many of us do. I am getting an error 401. Function call stop_for_status(res) is giving Unauthorized (HTTP 401). So @jpsmith and @TimG are probably right (pun not intended). Commented Jul 30 at 19:15
  • 1
    You can use the getQuote() function in package quantmod if you specify the second argument what based on what the helper functopm yahooQF() helps you select. (There are two named vectors here: one for the JSON request fields, one for for the returned column.) The corresponding Python package also allows you to select fields. Commented Jul 30 at 19:45

2 Answers 2

3

Expanding on my earlier comment here is a (partial) answer relying on the quantmod package and its handling of the request. Yahoo! actually supports a range of fields, and the Python package yfinance has slightly better documentation.

Here we first select (interactively !) the fields we want:

> library(quantmod) # CRAN package used here
> qf <- yahooQF()   # launches a GUI-based selector
> qf                # this corresponds to the selection I made
[[1]]
 [1] "symbol"                      "shortName"                  
 [3] "ask"                         "bid"                        
 [5] "regularMarketPrice"          "regularMarketChange"        
 [7] "regularMarketOpen"           "regularMarketDayHigh"       
 [9] "regularMarketDayLow"         "regularMarketVolume"        
[11] "regularMarketChangePercent"  "regularMarketPreviousClose" 
[13] "fiftyTwoWeekLow"             "fiftyTwoWeekHigh"           
[15] "ytdReturn"                   "trailingPE"                 
[17] "trailingAnnualDividendYield" "netAssets"                  
[19] "netExpenseRatio"            

[[2]]
 [1] "Symbol"            "Name"              "Ask"              
 [4] "Bid"               "Last"              "Change"           
 [7] "Open"              "High"              "Low"              
[10] "Volume"            "% Change"          "P. Close"         
[13] "52-week Low"       "52-week High"      "YTD Return"       
[16] "P/E Ratio"         "Dividend Yield"    "Net Assets"       
[19] "Net Expense Ratio"

attr(,"class")
[1] "quoteFormat"
>

Next we use this selection to download data for IVV:

> getQuote("IVV", what=qf)
             Trade Time Symbol                     Name    Ask    Bid
IVV 2025-07-30 15:52:42    IVV iShares Core S&P 500 ETF 634.96 635.02
      Last   Change  Open    High    Low  Volume  % Change P. Close
IVV 636.16 -2.24005 639.1 640.735 634.59 2909287 -0.350885    638.4
    52-week Low 52-week High YTD Return P/E Ratio Dividend Yield
IVV         484       641.74    6.18625   27.3658     0.00890977
     Net Assets Net Expense Ratio
IVV 6.22809e+11              0.03
> 

This function is vectorized so given a selection of fields as in qf here you could also retrieve multiple quotes at once.

PS For completeness a re-usable display of the qf variable I used:

> dput(qf)
structure(list(c("symbol", "shortName", "ask", "bid", "regularMarketPrice", 
"regularMarketChange", "regularMarketOpen", "regularMarketDayHigh", 
"regularMarketDayLow", "regularMarketVolume", "regularMarketChangePercent", 
"regularMarketPreviousClose", "fiftyTwoWeekLow", "fiftyTwoWeekHigh", 
"ytdReturn", "trailingPE", "trailingAnnualDividendYield", "netAssets", 
"netExpenseRatio"), c("Symbol", "Name", "Ask", "Bid", "Last", 
"Change", "Open", "High", "Low", "Volume", "% Change", "P. Close", 
"52-week Low", "52-week High", "YTD Return", "P/E Ratio", "Dividend Yield", 
"Net Assets", "Net Expense Ratio")), class = "quoteFormat")
> 
Sign up to request clarification or add additional context in comments.

2 Comments

This is nice! But in germany I get an error: Unable to obtain yahoo crumb. If this is being called from a GDPR country, Yahoo requires GDPR consent, which cannot be scripted
Oh sorry to hear that, that is rather painful. Yahoo! has been hiding more and more content and access behind such shenanigans. The issue ticket discussions for packages like quantmod have some context in existing discussions.
0

Ada's Qunatmod answer is to be preferred, but you asked

What code change is required to grab all values in the first table under the graph?

A lot. You want all <li> items in the <ul> below the div[@data-testid="quote-statistics"]. For a cleaner approach, you could target the 12 <fin-streamer> elements

<fin-streamer data-symbol="IVV" data-value="4,559,917" data-trend="none" active="" data-dfield="longFmt" data-field="regularMarketVolume" class="yf-1b7pzha">4,559,917</fin-streamer>

and manually adress the 3 remaining fields and pull their attributes. In my Code, I use selenider + chromote and then just pull the raw text out of the 15 <li> elements and apply some string splitting.

library(chromote)
library(selenider)
session <- selenider_session("chromote",options = chromote_options(headless = FALSE))
open_url("https://finance.yahoo.com/quote/IVV/")
try(s("button[name='reject']") |> elem_click(), silent = TRUE)
tab <- do.call(rbind,lapply(ss(xpath = '//div[@data-testid="quote-statistics"]//ul//li'), \(x) elem_text(x)))[,1]
res <- sub("52 Week", "FiftyTwo Week", tab)
res <- sub("5Y Monthly", "Five Y Monthly", res)
name <- sub("^(\\D+).*", "\\1", res) |> trimws()
value <- sapply(regmatches(res, regexpr("^(.+?)\\s+(?=[0-9])", res, perl = TRUE), invert = TRUE), \(x) trimws(paste(x[-1], collapse = "")))
si <- data.frame(name = name, value = value)
close_session()

                     name           value
1          Previous Close          638.40
2                    Open          639.10
3                     Bid    640.02 x 300
4                     Ask    640.65 x 400
5             Day's Range 634.59 - 640.73
6     FiftyTwo Week Range 484.00 - 641.74
7                  Volume       4,559,917
8             Avg. Volume       5,854,059
9              Net Assets         622.81B
10                    NAV          638.12
11         PE Ratio (TTM)           27.42
12                  Yield           1.29%
13 YTD Daily Total Return           9.12%
14  Beta (Five Y Monthly)            1.00
15    Expense Ratio (net)           0.03%

Disclaimer: I very much do not recommend this, your IP could be temporarly banned.


A much better approach would be to use the yahoofinancer package (I used kable because it makes the result more clear)

library(yahoofinancer)
s <- Ticker$new('IVV')
yf <- s$get_history(start = today(), interval = '1d')
fields <- c('regular_market_price', 'fifty_two_week_high', 'fifty_two_week_low',
            'regular_market_volume', 'exchange_name', 'full_exchange_name',
            'previous_close', 'currency', 'exchange_timezone_name', 'symbol')
yf[fields] <- lapply(fields, \(x) s[[x]])
date volume high low open close adj_close regular_market_price fifty_two_week_high fifty_two_week_low regular_market_volume exchange_name full_exchange_name previous_close currency exchange_timezone_name symbol
2025-07-30 20:00:00 4559917 640.735 634.59 639.1 637.51 637.51 637.51 641.74 484 4559917 PCX NYSEArca 638.4 USD America/New_York IVV

Or using the code base from {yahoofinanceR} you can write your own function that retrieves the Quote data directly from the API:

# using yahoofinancers function get_meta we can write our own version
# that retrieves the full API response as list
library(jsonlite)
library(httr)

# Source: https://github.com/rsquaredacademy/yahoofinancer/blob/dbad4b14f355ee925650f95d380d0eae52f821ab/R/ticker.R#L433
get_yahoo_symbol_info <- function(sym) {
  url <- paste0("https://query2.finance.yahoo.com/v8/finance/chart/", sym)
  jsonlite::fromJSON(httr::content(httr::GET(url), "text", encoding = "UTF-8"), 
                     simplifyVector = FALSE)$chart$result[[1]]
}

ivv_info <- get_yahoo_symbol_info("IVV")

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.