VERY VERY hacky (and really shouldn't be used productionally) but:
C#
Regex.Replace(input, @"<[^>]+?\/?>", m => {
// here you can exclude specific tags such as `<a>` or maybe `<b>`, etc.
return Regex.IsMatch(m.Value, @"^<a\b|\/a>$") ? m.Value : String.Empty;
});
Basically, it just takes out every HTML code with the exception of <a ...>...</a>.
Note: this DOES NOT:
- Validate if a tag was opened/closed/nested correctly.
- Validate if the
<> are actually HTML tags (maybe your input has < or > in the text itself?)
- Handle "nested"
<> tags. (e.g. <img src="http://placeholde.it/100" alt="foo<Bar>"/> will leave a remainder of "/> in the output string)
Here's the same thing turned in to a helper method:
// Mocks http://www.php.net/strip_tags
/// <summary>
/// Removed all HTML tags from the string and returned the purified result.
/// If supplied, tags matching <paramref name="allowedTags"/> will be left untouched.
/// </summary>
/// <param name="input">The input string.</param>
/// <param name="allowedTags">Tags to remain in the original input.</param>
/// <returns>Transformed input string.</returns>
static String StripTags(String input, params String[] allowedTags)
{
if (String.IsNullOrEmpty(input)) return input;
MatchEvaluator evaluator = m => String.Empty;
if (allowedTags != null && allowedTags.Length > 0)
{
Regex reAllowed = new Regex(String.Format(@"^<(?:{0})\b|\/(?:{0})>$", String.Join("|", allowedTags.Select(x => Regex.Escape(x)).ToArray())));
evaluator = m => reAllowed.IsMatch(m.Value) ? m.Value : String.Empty;
}
return Regex.Replace(input, @"<[^>]+?\/?>", evaluator);
}
// StripTags(input) -- all tags are removed
// StripTags(input, "a") -- all tags but <a> are removed
// StripTags(input, new[]{ "a" }) -- same as above