4

I know how to read a line in a txt file but for some reason C# is not detecting the end of line on HTML files. This code basically opens the html file and tries to parse line by line in search of the specified string. Even when just trying to print the first line of text in the HTML file nothign is displayed.

using (StreamReader sr = new StreamReader("\\\\server\\myFile.html"))
        {
            String line;
            while ((line = sr.ReadLine()) != null)
            {
                if(line == ("<td><strong>String I wantstrong></td>"))
                {
                    Label1.Text = "Text Found";
                    break;
                }
            }
        }

I have tried this using a plain txt file and it works perfectly, just not when trying to parse an HTML file.

Thanks.

4
  • The ending strong is the ending tag. Commented Jan 14, 2011 at 1:21
  • Sorry I messed up when copying and pasting, the '<' is there on my code. Also the '(' and ')' parenthesis are not on my code. Commented Jan 14, 2011 at 1:21
  • Is there anything in the file? Does the user running the application have permission to use that network resource? Does this code work if you copy the file locally? If you break in the loop, is the breakpoint hit? It seems to me that the debug work that needs to be done here is fairly straightforward... Commented Jan 14, 2011 at 1:22
  • You'll get an error if you try to read a file that you don't have permissions for (or otherwise doesn't exist). But whether or not it has content in it... well ;) Commented Jan 14, 2011 at 1:38

4 Answers 4

4

The best way by far is the use the HTML Agility Pack

More about this can be found on a previous Stack overflow Question

Looking for C# HTML parser

Sign up to request clarification or add additional context in comments.

Comments

3

You don't need to invent the wheel. Much better way to parse HTML is to use HTML parsers:

http://htmlagilitypack.codeplex.com/ or http://www.justagile.com/linq-to-html.aspx

Also similar question is here What is the best way to parse html in C#?

Hope it helps.

Comments

0

If you know this HTML you are parsing is of XHTML why not parse this HTML as XML using System.XML ?

Comments

0

Your outer loop that reads line works fine. My guess is one of the following is taken place:

  • The HTML file is empty
  • The first line in the HTML file is empty

In either case, you won't see anything printed.

Now, to your loop:

You likely don't see what you expect, because

 if(line == ("<td><strong>String I wantstrong></td>"))
 {
    Label1.Text = "Text Found";
    break;
 }

Looks for an EXACT match. If this is your actual code, you're missing the open bracket </ on </strong> and you're likely forgetting that there is white space (indentation) in your HTML content.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.