1

I want to extract data from this website which has shadow-dom. I think I've managed to access the elements inside the shadow-dom using JavaScript, but I haven't figured out how to use the returned value from the JavaScript as WebElements so that I can process the data.

library(RSelenium)

rD <- rsDriver(browser="firefox", port=4547L, verbose=F)
remDr <- rD[["client"]]

remDr$navigate("https://www.transfermarkt.us")

## run script to enable dropdown list in the website. This creates a <ul> tag in the shadow-dom which lists all items in the dropdown list.
remDr$executeScript("return document.querySelector('tm-quick-select-bar').setAttribute('dropdown-visible', 'countries')")
Sys.sleep(5)

This is only the portion that contains the shadow-dom. I'm not sure if this is required, but this is where the dropdown lists is present

wrapper <- remDr$findElement(using="tag name", value="tm-quick-select-bar")

Below is the script to access the dropdown list

script <- 'return document.querySelector("#main > header > div.quick-select-wrapper > tm-quick-select-bar").shadowRoot.querySelector("div > tm-quick-select:nth-child(2) > div > div.selector-dropdown > ul");'

test <- remDr$executeScript('return document.querySelector("#main > header > div.quick-select-wrapper > tm-quick-select-bar").shadowRoot.querySelector("div > tm-quick-select:nth-child(2) > div > div.selector-dropdown > ul");', list(wrapper))

This returns the following list.

> test                                                                                    
$`element-6066-11e4-a52e-4f735466cecf`                                                    
[1] "4adac8f8-2c94-4e48-b7a3-521eb961ef8c"  

I have no idea how to extract the items from this. It doesn't seem like it's a WebElement. What is this list and what information does it contain? How can I extract it?

I tried this

lapply(test, function(x){
    x$getElementText()
    x[[1]]$getElementText()
})

But, it returns the errors:

Error in x$getElementText : $ operator is invalid for atomic vectors      
1
  • I'm not sure which dropdown are you trying to access. Is it the country selector which is defaulted to US ? Commented Oct 25, 2022 at 4:17

2 Answers 2

1

Not sure if selenium can deal with shadow DOM, there is a plugin here that supposedly solves that for java. Nevertheless, you can extract innerHTML an manage it with rvest

library(RSelenium)

rD <- rsDriver(browser="chrome", port=4547L, verbose=F, chromever="106.0.5249.21")
remDr <- rD[["client"]]

remDr$navigate("https://www.transfermarkt.us")

## run script to enable dropdown list in the website. This creates a <ul> tag in the shadow-dom which lists all items in the dropdown list.
remDr$executeScript("return document.querySelector('tm-quick-select-bar').setAttribute('dropdown-visible', 'countries')")
Sys.sleep(5)


wrapper <- remDr$findElement(using="tag name", value="tm-quick-select-bar")

script <- paste0(
  'return document.querySelector("#main > header > div.quick-select-wrapper > tm-quick-select-bar")',
  '.shadowRoot.querySelector("div > tm-quick-select:nth-child(2) > div > div.selector-dropdown > ul")'
  '.innerHTML;')

test <- remDr$executeScript(script)

html <- rvest::read_html(test[[1]])

rvest::html_text(html)

# " Afghanistan Albania Algeria American Samoa American .....
Sign up to request clarification or add additional context in comments.

1 Comment

Yes!!! that's what I'm looking for. So, I was just missing .innerHTML
1

I don't know R, but for example:

let shadowEls = [...document.querySelectorAll('*')].filter(el => el.shadowRoot)
return shadowEls[0].shadowRoot.innerHTML

That should be enough to figure this bit out.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.