26

I'm trying to create a code snippet to remove all style attributes regardless of tag using HtmlAgilityPack.

Here's my code:

var elements = htmlDoc.DocumentNode.SelectNodes("//*");

if (elements!=null)
{
    foreach (var element in elements)
    {
        element.Attributes.Remove("style");
    }
}

However, I'm not getting it to stick? If I look at the element object immediately after Remove("style"). I can see that the style attribute has been removed, but it still appears in the DocumentNode object. :/

I'm feeling a bit stupid, but it seems off to me? Anyone done this using HtmlAgilityPack? Thanks!

Update

I changed my code to the following, and it works properly:

public static void RemoveStyleAttributes(this HtmlDocument html)
{
   var elementsWithStyleAttribute = html.DocumentNode.SelectNodes("//@style");

   if (elementsWithStyleAttribute!=null)
   {
      foreach (var element in elementsWithStyleAttribute)
      {
         element.Attributes["style"].Remove();
      }
   }
}
2
  • Can you add a reproduction code? because I have tested this html <html style='style1'><body style='style2'></body></html> and it works Commented May 2, 2011 at 6:47
  • Do you use InnerHtml property? At the time of writing this it has a bug, use WriteContentTo method instead. Commented Jul 16, 2011 at 9:10

2 Answers 2

10

Your code snippet seems to be correct - it removes the attributes. The thing is, DocumentNode .InnerHtml(I assume you monitored this property) is a complex property, maybe it get updated after some unknown circumstances and you actually shouldn't use this property to get the document as a string. Instead of it HtmlDocument.Save method for this:

string result = null;
using (StringWriter writer = new StringWriter())
{
    htmlDoc.Save(writer);
    result = writer.ToString();
}

now result variable holds the string representation of your document.

One more thing: your code may be improved by changing your expression to "//*[@style]" which gets you only elements with style attribute.

Sign up to request clarification or add additional context in comments.

2 Comments

Thanks for replying! Yeah, I had changed my code to the following to make it "stick": 'public static void RemoveStyleAttributes(this HtmlDocument html) { var elementsWithStyleAttribute = html.DocumentNode.SelectNodes("//@style"); if (elementsWithStyleAttribute!=null) { foreach (var element in elementsWithStyleAttribute) { element.Attributes["style"].Remove(); } } }' Not sure why my original code didn't work, but I think you're right in your guess. Thanks!
Wow, code formatting in comments isn't great. :) Updated my question with the modified code snippet. Thanks again!
9

Here is a very simple solution

VB.net

element.Attributes.Remove(element.Attributes("style"))

c#

element.Attributes.Remove(element.Attributes["style"])

2 Comments

Thanks, one correction: element.Attributes("style") should be element.Attributes["style"]
You are right cause i don't make it clear : my code is for vb.net

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.