0

I am trying to remove a particular property from a HTML string.

Here is my sample HTML string.

<span lang=EN-GB style='font-size:10.0pt;line-height:115%;font-family:"Tahoma","sans-serif";color:#17365D'>Thank you</span>

Is there any way to remove the line-height:115%; property from the string, which would have provide me the output as below by using Regex in C#.net?

<span lang=EN-GB style='font-size:10.0pt;font-family:"Tahoma","sans-serif";color:#17365D'>Thank you</span>

I have tried with this Regex, but it just removed all of the style attribute, but what I am trying to achieve here is to remove only the line-height property.

Regex.Replace(html, @"<([^>]*)(?:style)=(?:'[^']*'|""[^""]*""|[^\s>]+)([^>]*)>", "<$1$2>", RegexOptions.IgnoreCase);

I just need to match the line-height property in the style attribute without caring about the value it has and remove the whole line till the end of semicolon(;). Any help would be greatly appreciated. Thanks.

8
  • 1
    Just checking that opening the HTML in notepad with find/replace isnt an option? Commented May 6, 2014 at 8:07
  • 1
    Please show what you have tried. Commented May 6, 2014 at 8:07
  • 1
    I would recommend you to use a DOM parser instead of regular expressions. Regex is not recommended when dealing with html/xml. Commented May 6, 2014 at 8:10
  • If you want to post code, edit your question instead of posting it in comments - it will be way more readable. Commented May 6, 2014 at 8:20
  • Parsing HTML with regex summons tainted souls into the realm of the living. stackoverflow.com/questions/1732348/… Commented May 6, 2014 at 8:34

2 Answers 2

1

You could try using HtmlAgilityPack for this instead of using Regex.

Excuse me for the below example is a lil messy(but works) just to give you an idea of this.

HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml("<span lang=EN-GB style='font-size:10.0pt;line-height:115%;font-family:\"Tahoma\",\"sans-serif\";color:#17365D'>Thank you</span>");

foreach (var item in doc.DocumentNode.Descendants("span"))
{
    var temp = item.Attributes["style"];
    var styles = temp.Value.Split(';').ToList();
    var newStyleList = styles.Where(m => !m.Contains("line-height:115%")).ToList();
    string newStyle = string.Empty;
    foreach (var style in newStyleList)
    {
        newStyle += style + ";";
    }
}
Sign up to request clarification or add additional context in comments.

Comments

0

thanks everyone for your kind suggestion. I have figured out a Regex for this situation. Here's it if anyone is interested. Thank you.

html = Regex.Replace(html, @"line-height:[^;]+;", "", RegexOptions.IgnoreCase);

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.