5

How to remove all inline styles and other attributes(class,onclick) from html elements using Jsoup?

Sample Input :

<div style="padding-top:25px;" onclick="javascript:alert('hi');">
This is a sample div <span class='sampleclass'> This is a sample span </span>
</div>

Sample Output :

<div>This is a sample div <span> This is a sample span </span> </div>

My Code (Is this is a right way or any other better approach is there?)

Document doc = Jsoup.parse(html);
Elements el = doc.getAllElements();
for (Element e : el) {
    Attributes at = e.attributes();
    for (Attribute a : at) {    
        e.removeAttr(a.getKey());    
    }
}
3
  • @T.J.Crowder thanks for the reply. See my updated question. Is this is a right way or any other better approach is there ?? Commented Nov 5, 2013 at 9:36
  • @vjy Is that updated code working for you? Or still not working? Commented Nov 5, 2013 at 9:37
  • @ashatte I found the working code and updated in the question. I want to know what I am doing is right or any other better api, instead of iterating through all elements to clear attributes ?? Commented Nov 5, 2013 at 9:52

1 Answer 1

12

Yes, one method is indeed to iterate through the elements and call removeAttr();

An alternative method using jsoup is to make use of the Whitelist class (see docs), which can be used with the Jsoup.clean() function to remove any non-specified tags or attributes from the document.

For example:

String html = "<html><head></head><body><div style='padding-top:25px;' onclick='javascript.alert('hi');'>This is a sample div <span class='sampleclass'>This is a simple span</span></div></body></html>";

Whitelist wl = Whitelist.simpleText();
wl.addTags("div", "span"); // add additional tags here as necessary
String clean = Jsoup.clean(html, wl);
System.out.println(clean);

Will result in the following output:

11-05 19:56:39.302: I/System.out(414): <div>
11-05 19:56:39.302: I/System.out(414):  This is a sample div 
11-05 19:56:39.302: I/System.out(414):  <span>This is a simple span</span>
11-05 19:56:39.302: I/System.out(414): </div>
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.