0

I want to remove the text between html tags and then display it in textBox2. I need to get the start postion for "<" and ">" and then delete the tags and everything in between. I dont want to use regex.

Here's what i got so far

        string input = textBox1.Text;
        string output = textBox2.Text;
        string results;
        for (int i = 0; i < input.Length; i++)
        {
            if(input.IndexOf('<',i) !=-1 )
            {


            }
3
  • Can < and > appear more than once? Do you want to handle cases where < and > can appear inside each other? Are they guaranteed to appear in the string? Can the input be malformed so that a < can exist without a >? Commented Dec 19, 2012 at 5:52
  • Input is a textbox so lets say the user inputs: Hi <backround="blue"> Benny, the output would be Hi Benny. Commented Dec 19, 2012 at 5:55
  • @Dan Herbert Yes, they can appear more than once, but they cant appear within eachother. Commented Dec 19, 2012 at 5:57

1 Answer 1

2

This should do what you're looking for. However, it won't handle cases where there is malformed markup. So for example, if you were to enter the input string Hello < world, the output would be Hello.

string input = textBox1.Text;
StringBuilder output = new StringBuilder(input.Length);
bool inATag = false;

for (var i = 0; i < input.Length; i++) {
    if (!inATag && input[i] != '>' && input[i] != '<') {
        output.Append(input[i]);
    } else if (input[i] == '<') {
        inATag = true;
    } else if (input[i] == '>') {
        inATag = false;
    }
}

textBox2.Text = output.ToString();

To explain a little more about what's going on, I'm iterating through the input string one character at a time. If I find an opening <, I enter a state where I will not add any of the input to the output until I find the closing >.

The way I'm generating the output string is by using a StringBuilder to do string concatenation, which improves performance over using just string output += input[i]. It is not recommended to simply use a string as your output variable type because every time you concatenate 2 strings together, it allocates a completely new and distinct string. Over time, this will impact performance. With a StringBuilder, only one string object will be allocated, and no new string objects are created with every iteration through the loop.

Microsoft has written a good explanation of why to use a StringBuilder, but the general rule is that you should be using a StringBuilder any time you find yourself concatenating strings inside of a loop.

Conversely, for situations where your input string is known to always be small, it is better to not use a StringBuilder. There is a penalty for creating a StringBuilder object that isn't overcome if you're only concatenating a small number of strings. For example, if you expect to only do 10 string concatenations it would be considered an anti-pattern to use a StringBuilder. However if you're concatenating hundreds of strings, like you are in this example, it is a very good candidate for using a StringBuilder.

Sign up to request clarification or add additional context in comments.

4 Comments

output.Append(input[i]);
@A.V Thanks. Not sure how I missed that.
@DanHerbert Can you explain your code? What does output.Append(input[i]); do? And why do we have to use string builder ?
@Benny I've added an explanation to my code. If anything else is unclear, let me know and I'll update my answer.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.