1

I have some files that were displayed in a browse and then I used File, Save As.. to place the text in a local file. The page has some scripting and it will not display properly in a WebBrowserControl on a WinForm. The problem appears to be scripts as the control displays "script error" dialogs. I don't really need to view the file but to just retrieve a few elements by ID.

The first block of code below does load the file into a local object, but only the first 4096 bytes. (Same happens if I use a WebBrowser resident on the form.)

The second block doesn't complain but the GetElementByID fails as the desired element is beyond the first 4096.

    Dim web As New WebBrowser
    web.AllowWebBrowserDrop = False
    web.ScriptErrorsSuppressed = True
    web.Url = New Uri(sFile)

    Dim doc As HtmlDocument
    Dim elem As HtmlElement
    doc = web.Document
    elem = doc.GetElementById("userParts")

What am I doing wrong?

Is there a better approach for a VB.Net WinForm project for loading an HTML document from which I can read elements?


I just went with string functions for the simple task at hand:

    Function GetInnerTextByID(html As String, elemID As String) As String
    Try
        Dim s As String = html.Substring(html.IndexOf("<body>"))
        s = s.Substring(s.IndexOf(elemID))
        s = s.Substring(s.IndexOf(">") + 1)
        s = s.Substring(0, s.IndexOf("<"))
        s = s.Replace(vbCr, "").Replace(vbLf, "").Trim
        Return s
    Catch ex As Exception
        Return ""
    End Try
End Function

I'd still be interested in a native VB.Net (non-ASP) approach. Or why the OP only loads 4096 bytes.

5
  • You can use HtmlAgilityPack Commented Sep 30, 2014 at 21:46
  • True - but overly complex for my simple task of extracting a few elements by ID. Commented Sep 30, 2014 at 21:52
  • 1
    It has also a document.GetElementById method which is pretty simple. And it has no strange issues with scripts or bytes. Just load the document from web,file or plain string. Commented Sep 30, 2014 at 21:57
  • HtmlAgilityPack is complex because its solving the exact issues you're facing right now. Without it, your code will end up a mess itself and probably less efficient. I will never understand people's reluctance to bring a proven/tested third party library into their project. Commented Sep 30, 2014 at 23:05
  • @Simon Whitehead. Then let me tell you why. Third party libraries have their limitations. You will inevitably run into an issue where you need some functionality that the library can't handle. It can do what it can do, and if you need more, then it is just too bad. Then you can start making all kinds of workarounds etc., and you will most likely end up coding it yourself from scratch anyway. No thanks. I stay far away from third party libraries if at all possible. I want code that I can change and upgrade as I wish, with no limitations. Commented Oct 20 at 17:38

1 Answer 1

3

I would use HtmlAgilityPack instead.

You: "True - but overly complex for my simple task of extracting a few elements by ID."

It has also a document.GetElementbyId method which is rather simple. And it has no strange issues with scripts or bytes. Just load the document from web, stream, file or from a plain string.

For example (web):

Dim document As New HtmlAgilityPack.HtmlDocument
Dim myHttpWebRequest = CType(WebRequest.Create("URL"), HttpWebRequest)
myHttpWebRequest.UserAgent = "Mozilla/5.0 (compat ble; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0)"
Dim streamRead = New StreamReader(CType(myHttpWebRequest.GetResponse(), HttpWebResponse).GetResponseStream)
Dim res As HttpWebResponse = CType(myHttpWebRequest.GetResponse(), HttpWebResponse)
document.Load(res.GetResponseStream(), True)

Dim node As HtmlNode = document.GetElementbyId("userParts")

or from file:

document.Load("Path")

or from string(f.e. a whole webpage in a html-file read by File.ReadAllText):

document.LoadHtml("HTML")
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.