1

I am parsing an html string with Jsoup in order to extract just the text, and want to get the exact text, but when I parse strings that include escaped chars Jsoup unescapes them. For example - if I parse

<p>Let&#39;s try</p>

Jsoup returns

<p>Let's try</p>

I searched extensively for a solution and tried using the doc.outputSettings with different options of charset and escapeMode, but couldn't get Jsoup to not escape the html special chars

2
  • Why does it matter? The HTML has exactly the same meaning either way. Commented Jun 27, 2021 at 12:17
  • After extracting the text I am running some manipulations on it, then I want to find and replace in the original string. Because of the unescaping I can't find the extracted text Commented Jun 27, 2021 at 12:55

1 Answer 1

0

Judging from this comment and the current EscapeMode documentation, this is not possible with Jsoup.

I'm never going to implement EscapeMode.none as it only leads to broken parse trees.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.