213

Is there any easy way to remove all HTML tags or ANYTHING HTML related from a string?

For example:

string title = "<b> Hulk Hogan's Celebrity Championship Wrestling &nbsp;&nbsp;&nbsp;<font color=\"#228b22\">[Proj # 206010]</font></b>&nbsp;&nbsp;&nbsp; (Reality Series, &nbsp;)"

The above should really be:

"Hulk Hogan's Celebrity Championship Wrestling [Proj # 206010] (Reality Series)"

3
  • This question is closed due to duplication but suggested answer is given using Html Agility Pack. If you want to remove html tags with out using Html Agility pack you can refer my answer here stackoverflow.com/a/30026043/2318354 . Which may be helpful to some one Commented May 5, 2015 at 10:55
  • 10
    This is not a duplicate, as "HTML agility pack - removing unwanted tags without removing content?" wants to keep some tags (ie, give a list of valid tags, remove the rest). This question here is about removing ALL tags. And I can't use the other question's answers as I'm not going to pass in a list of all html tags in existence. Commented Jan 18, 2017 at 19:23
  • Take a look at xidel. It will take you 95% of the way there with xidel -s input -e '/'. Commented Apr 24, 2020 at 19:19

7 Answers 7

412

You can use a simple regex like this:

public static string StripHTML(string input)
{
   return Regex.Replace(input, "<.*?>", String.Empty);
}

Be aware that this solution has its own flaw. See Remove HTML tags in String for more information (especially the comments of 'Mark E. Haase'/@mehaase)

Another solution would be to use the HTML Agility Pack.
You can find an example using the library here: HTML agility pack - removing unwanted tags without removing content?

Sign up to request clarification or add additional context in comments.

10 Comments

Doesn't work for input: '7 < 10 <b>but</b> 30 > 10' it gives: '7 but 30 > 10'
Shouldn't the method name be StripHtml() since method names should use Pascal case?
Using regular expressions for this is probably not a good idea if you are using it for security reasons.
Just change the regex to <[a-zA-Z/]*?>
@BrandonPrudent maybe better would be <[a-zA-Z/].*?> - it includes attributes
|
88

You can parse the string using Html Agility pack and get the InnerText.

    HtmlDocument htmlDoc = new HtmlDocument();
    htmlDoc.LoadHtml(@"<b> Hulk Hogan's Celebrity Championship Wrestling &nbsp;&nbsp;&nbsp;<font color=\"#228b22\">[Proj # 206010]</font></b>&nbsp;&nbsp;&nbsp; (Reality Series, &nbsp;)");
    string result = htmlDoc.DocumentNode.InnerText;

2 Comments

I like the InnerText solution as it removes all tags. But... it leaves behind &nbsp; and also comment tags such as <!-- xxx --> like those surrounding v:shapetype, v:shape or v:imagedata with [if gte vml 1] or [if !vml]
I realize that &nbsp; is an html entity, not a tag, so a solution to remove that could be result = WebUtility.HtmlDecode(result); and to remove the comment nodes, using the Html Agility Pack: htmlDoc.DocumentNode.SelectNodes("//comment()")?.ForEach(c=> c.Remove()); just before doing result = htmlDoc.DocumentNode.InnerText;
7

You can use the below code on your string and you will get the complete string without html part.

string title = "<b> Hulk Hogan's Celebrity Championship Wrestling &nbsp;&nbsp;&nbsp;<font color=\"#228b22\">[Proj # 206010]</font></b>&nbsp;&nbsp;&nbsp; (Reality Series, &nbsp;)".Replace("&nbsp;",string.Empty);            
        string s = Regex.Replace(title, "<.*?>", String.Empty);

Comments

1

I built a small function to remove HTML tags.

public static string RemoveHtmlTags(string text)
        {
            List<int> openTagIndexes = Regex.Matches(text, "<").Cast<Match>().Select(m => m.Index).ToList();
            List<int> closeTagIndexes = Regex.Matches(text, ">").Cast<Match>().Select(m => m.Index).ToList();
            if (closeTagIndexes.Count > 0)
            {
                StringBuilder sb = new StringBuilder();
                int previousIndex = 0;
                foreach (int closeTagIndex in closeTagIndexes)
                {
                    var openTagsSubset = openTagIndexes.Where(x => x >= previousIndex && x < closeTagIndex);
                    if (openTagsSubset.Count() > 0 && closeTagIndex - openTagsSubset.Max() > 1 )
                    {
                        sb.Append(text.Substring(previousIndex, openTagsSubset.Max() - previousIndex));
                    }
                    else
                    {
                        sb.Append(text.Substring(previousIndex, closeTagIndex - previousIndex + 1));
                    }
                    previousIndex = closeTagIndex + 1;
                }
                if (closeTagIndexes.Max() < text.Length)
                {
                    sb.Append(text.Substring(closeTagIndexes.Max() + 1));
                }
                return sb.ToString();
            }
            else
            {
                return text;
            }
        }

Comments

0
public static string StripHtml(string input)
    {
        return string.IsNullOrEmpty(input) ? input : System.Web.HttpUtility.HtmlDecode(System.Text.RegularExpressions.Regex.Replace(input, "<.*?>", String.Empty));
    }

Comments

-1
public static string StripHTML(string input)
{
    if (input==null)
    {
        return string.Empty;
    }
    return Regex.Replace(input, "<.*?>", String.Empty);

}

2 Comments

Your answer could be improved with additional supporting information. Please edit to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers in the help center.
This solution is already provided
-1
static Regex htmlRegex = new Regex("<.*?>", RegexOptions.Compiled);
   
public static string RemoveHTMLTagsCompiled(string html)
{
 return htmlRegex.Replace(html, string.Empty);
}

1 Comment

If your code does something different (meaningful for the question) than the one in this earlier answer, please explain. If not, your answer adds nothing

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.