1

I want strip html from string with regular expression and while this regex works everywhere it does not work in .net I don't understand why.

using System;
                        
public class Program
{
    public static void Main()
    {
        var text = "FOO <span style=\"mso-bidi-font-size:11.0pt;\nmso-fareast-language:EN-US\"> BAR";
        var res = System.Text.RegularExpressions.Regex.Replace(text, "<.*?>", "");
        Console.WriteLine(res);
    }
}
1

2 Answers 2

5

You're missing the correct Regex option:

var res = System.Text.RegularExpressions.Regex.Replace(text, "<.*?>", "", RegexOptions.Singleline);

The reason you need this is because you have a newline (\n) in your HTML. Singleline will ensure that . even matches newline characters.

Docs blurb:

Specifies single-line mode. Changes the meaning of the dot (.) so it matches every character (instead of every character except \n). For more information, see the "Single-line Mode" section in the Regular Expression Options article.

Docs

Try it online

Sign up to request clarification or add additional context in comments.

Comments

0

Try this:

System.Text.RegularExpressions.Regex.Replace(text, "<[^>]*>", "");

This will strip the html of your string.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.