1

I'd like to know how I can easily remove specific values from a string using C# and RegEx. I have the following HTML string:

Add [tt]PEELED PLUM SHAPED TOMATOES in tomato juice[/tt][rg]WHOLE PEELED TOMATOES[/rg][rp]WHOLE   PEELED TOMATOES in JUICE[/rp], basil, oregano, parsley, salt, black pepper, sugar, [tt]TOMATO SAUCE[/tt][rg]TOMATO SAUCE[/rg][rp]TOMATO SAUCE[/rp], [brand][rg]TOMATO PASTE[/rg][rp]TOMATO PASTE[/rp]

I need some way to filter out e.g. this part:

[tt]PEELED PLUM SHAPED TOMATOES in tomato juice[/tt]

So the [tt] tag should be removed as well as the text that is in between. If the [tt] tag occurs multiple times inside the source string, it should be removed as well.

It this doable by using RegEx?

Thanks, Daniel

0

1 Answer 1

1

Yes. As long as the [tt] tags are never nested, it's easy:

result = Regex.Replace(subject, @"\[tt\].*?\[/tt\]", "", RegexOptions.Singleline);

If you do expect nested [tt] tags, then you need to apply the following command repeatedly, once for each level of nesting:

result = Regex.Replace(subject, @"\[tt\](?:(?!\[/?tt\]).)*\[/tt\]", "", RegexOptions.Singleline);
Sign up to request clarification or add additional context in comments.

4 Comments

...but what happens when the string looks like [tt]abc[tt]def[/tt]ghi[/tt]?? Oh thats right. Don't use regex to parse html.
If you're worried about nesting, it's easy enough to throw it into a loop and replace until there are no more matches.
@Lincoded: No, that wouldn't work with this regex. It would match [tt] foo [tt] bar [/tt]. It can be changed to handle that, though.
I wasn't trying to say that the exact regex mentioned above would work for nesting, but you're definitely right about that.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.