I am trying to parse an html input using jsoup (v1.18.1), extract elements, extract each attribute value and replace as follows:
>with><with<
The method I'm feeding this code into cannot have these symbols inside attribute values.
Below is the code I'm using
Elements elements = htmlDocument.getAllElements();
// Process each element's attributes
for (Element element : elements) {
// Iterate over all attributes of the element
for (Attribute attribute : element.attributes()) {
String originalValue = attribute.getValue();
// Escape only '>' and '<' characters
String escapedValue = escapeSpecificHtmlChars(originalValue);
// Update the attribute with the escaped value
element.attr(attribute.getKey(), escapedValue);
}
}
private String escapeSpecificHtmlChars(final String input) {
if (StringUtils.isBlank(input)) {
return input;
}
// Replace only '>' with '>' and '<' with '<'
return input.replace(">", ">")
.replace("<", "<");
}
Let's say the element is
<span role="text" aria-label=">test value>">Test value!</span>
Attribute aria-label has the value ">test value>"
escapedValue would be ">test value>"
But when I set the element using element.attr(attribute.getKey(), escapedValue);, the attribute value becomes "&gt;test value&gt;"
I want the escapedValue to stay as is when I set it as the attribute value.
Any help would be appreciated!