I encountered this seemingly safe function to extract text content from html
function getText(html) {
const div = document.createElement('div')
div.innerHTML = html
return div.textContent
}
It uses innerHTML but the div is never appended to the DOM so I would guess that it isn't harmful
And it indeed works fine normally:
const text = getText('<b>some text</b>')
console.log(text) // prints "some text"
function getText(html) {
const div = document.createElement('div')
div.innerHTML = html
return div.textContent
}
But it can also lead to xss:
// opens an alert
const text = getText('<b>some text</b><img src="" onerror="alert(1)">')
// prints "some text"
console.log(text)
function getText(html) {
const div = document.createElement('div')
div.innerHTML = html
return div.textContent
}
Even weirder things start to happen when we prepend the html with "<script></script>"
// throws an error and injects html into the page
const text = getText('<script></script><b>some text</b>')
console.log(text)
function getText(html) {
const div = document.createElement('div')
div.innerHTML = html
return div.textContent
}
Why does it load the html if it's not appended to the DOM?
Why <script></script> causes it to inject html to the page?