2

I would like a regex to remove html tags and &nbsp, &quot etc from a string. The regex I have is to remove the html tags but not the others mentioned. I'm using .Net 4

Thanks

CODE:

     String result = Regex.Replace(blogText, @"<[^>]*>", String.Empty);
3

2 Answers 2

1

Don't use Regular Expressions, use the HTML Agility pack:

http://www.codeplex.com/htmlagilitypack

Sign up to request clarification or add additional context in comments.

Comments

0

If you want to build on what you what you already created, you can change it to the following:

String result = Regex.Replace(blogText, @"<[^>]*>|&\w+", String.Empty);

It means...

  1. Either match tags as you defined...
  2. ...or match a & followed by at least one word character \w -- as many as possible.

Neither of these two work in all nasty cases, but usually it does.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.