0

I am trying to parse a website's HTML and then get text between two strings.

I wrote a small function to get text between two strings.

public string getBetween(string strSource, string strStart, string strEnd)
{
    int Start, End;
    if (strSource.Contains(strStart) && strSource.Contains(strEnd))
    {
        Start = strSource.IndexOf(strStart, 0) + strStart.Length;
        End = strSource.IndexOf(strEnd, Start);
        return strSource.Substring(Start, End - Start);
    }
    else
    {
        return string.Empty;
    }
}

I have the HTML stored in a string called 'html'. Here is a part of the HTML that I am trying to parse:

<div class="info">
                                    <div class="content">
                                        <div class="address">
                                        <h3>Andrew V. Kenny</h3>
                                        <div class="adr">
                                        67 Romines Mill Road<br/>Dallas, TX 75204                                        </div>
                                    </div>

<p>Curious what <strong>Andrew</strong> means? <a href="http://www.babysfirstdomain.com/meaning/boy/andrew">Click here to find out!</a></p>

So, I use my function like this.

    string m2 = getBetween(html, "<div class=\"address\">", "<p>Curious what");
    string fullName = getBetween(m2, "<h3>", "</h3>");
    string fullAddress = getBetween(m2, "<div class=\"adr\">", "<br/>");
    string city = getBetween(m2, "<br/>", "</div>");

The output of the full name works fine, but the others have additional spaces in them for some reason. I tried various ways to avoid them (such as completely copying the spaces from the source and adding them in my function) but it didn't work.

I get an output like this:

fullName = "Andrew V. Kenny"
fullAddress = "                                            67 Romines Mill Road"
city = "Dallas, TX 75204                                        "

There are spaces in the city and address which I don't know how to avoid.

2
  • Is your output inclusive of all the spaces..? Commented Aug 11, 2015 at 3:52
  • @Ben Yes, the output (HTML) includes the spaces. I have tried copying the exact phrase with spaces, but it didn't work when parsing. I posted an example how the HTML looks in my post too. Commented Aug 11, 2015 at 3:59

1 Answer 1

3

Trim the string and the unecessary spaces will be gone:

fullName = fullName.Trim ();
fullAddress = fullAddress.Trim ();
city = city.Trim ();
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.