0

I need to parse/extract information from an html page. Basically what I'm doing is loading the page as a string using System.Net.WebClient and using HTML Agility Pack to get content inside html tags (forms, labels, imputs and so on).

However, some content is inside a javascript script tag, like this:

<script type="text/javascript">
//<![CDATA[
var itemCol = new Array();

itemCol[0] = {
    pid: "01010101",
    Desc: "Some desc",
    avail: "Available",
    price: "$10.00"
};

itemCol[1] = {
    pid: "01010101",
    Desc: "Some desc",
    avail: "Available",
    price: "$10.00"
};

//]]>
</script>

So, how could I parse it to a collection in .NET? Can HTML Agility Pack help with that? I really appreciate any help.

Thanks in advance.

3 Answers 3

1

The HAP will not parse out the javascript for you - the best it will do is parse out the contents of the element.

javascript.net may fit the bill.

Sign up to request clarification or add additional context in comments.

1 Comment

For some reason I was unable to install javascript.net (got some errors) but anyways, i was able to do the same with Jint. Thanks.
1

what part of the content inside the script tag do you want? What kind of collection are you expecting. You can always select script tags using below

  HtmlDocument document = new HtmlDocument();
  document.Load(downloadedHtml);
  XPathNavigator n = document.CreateNavigator();
  XPathNodeIterator scriptTags = n.Select("//script");

  foreach (XPathNavigator nav in scriptTags)
  {
    string innerXml = nav.InnerXml;

    // Parse inner xml using regex
  }

1 Comment

using javascript.net using (JavascriptContext context = new JavascriptContext()) { context.SetParameter("data", new MyObject()); StringBuilder s = new StringBuilder(); foreach (XPathNavigator nav in scriptTags) { s.Append(nav.InnerXml); } s.Append(";data.item = itemCol;"); context.Run(s.ToString()); MyObject o = context.GetParameter("data") as MyObject; Then just have a datastructure like class MyObject { public object item { get; set; } }
1

using the javascript.net library you can get a collection

 using (JavascriptContext context = new JavascriptContext())
  {
    context.SetParameter("data", new MyObject());

     StringBuilder s = new StringBuilder();

    foreach (XPathNavigator nav in scriptTags)
    {
       s.Append(nav.InnerXml);
    }

  s.Append(";data.item = itemCol;");
  context.Run(s.ToString());

  MyObject o = context.GetParameter("data") as MyObject;

Then just have a datastructure like

   class MyObject
   {
     public object item { get; set; }
   }

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.