1

I'm trying to develop a desktop app to be used as a website scraping tool. My requirement is the user should be able to specify a url in the desktop app.The desktop app should be able to invoke the asp.net script to scrape data from the website and return the records to the desktop app.

Should I use a web service or a ASP.NET runtime for this...???

Any help is appreciated :)

Additional details

The scraping activity is already done.I used HTMLAgility pkg. This is my scraping code to extract a list of company names from a web page.

public static String getPageHTML(String URL)
        {
            String totalCompanies = null;
            HttpWebRequest httpWebRequest = (HttpWebRequest)HttpWebRequest.Create(URL);

            IWebProxy myProxy = httpWebRequest.Proxy;

            if (myProxy != null)
            {
                myProxy.Credentials = CredentialCache.DefaultCredentials;
            }

            httpWebRequest.Method = "GET";

            HttpWebResponse res;

            res = (HttpWebResponse)httpWebRequest.GetResponse();

            HtmlDocument doc1 = new HtmlDocument();

            doc1.Load(res.GetResponseStream());

            HtmlNode node = doc1.DocumentNode.SelectSingleNode("//td[@class='mainbody']/table/tr[last()]/td");

            try
            {
                totalCompanies = node.InnerText;
                return totalCompanies;
            }
            catch (NullReferenceException e)
            {
                   totalCompanies = "No records found";
                    return totalCompanies;

             }




        }
2
  • I think you need web service. Commented Apr 26, 2013 at 8:54
  • 1
    Why cant you just download through WebClient? and use HtmlAgilityPack to parse retrieved html? Also, look into some basics of multithreading to do things in parallel. Commented Apr 26, 2013 at 8:57

3 Answers 3

1

You can use HttpWebRequest within your desktop app, i've done this before (winforms). For example: -

HttpWebRequest req = (HttpWebRequest)WebRequest.Create("url");
var response = new StreamReader(req.GetResponse().GetResponseStream()).ReadToEnd();

You can then use HtmlAgilityPack to parse the data from the response:

 HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
 doc.LoadHtml(response);

 //Sample query
 var node = doc.DocumentNode.Descendants("div")
           .Where(d => d.Attributes.Contains("id")).ToList(); 
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks DGibbs.I'll try this and come back to you :)
0

(it would be helpful to include more details/be more specific)

If your ASP.NET page already does all the scraping, and all you need to do is access that ASP.NET page, you can simply use HttpWebRequest

http://msdn.microsoft.com/en-us/library/456dfw4f.aspx - short description & tutorial

If that URL is the website TO BE SCRAPED, and you need to include that ASP.NET script in your project, then you need to add it as a web service.

3 Comments

the URL is the website to be scraped
Ok. And where is the ASP.NET script you want to use? you either have to reference it in your code, or (wild guess) HTTP send it your url as a parameter. Please provide more details about the script
ok..now I don't really understand where's the problem. You've got the code to scrape, you've got the desktop app..what's the issue?
0

You can do it with both but also you can do it by adding a webbrowser to your desktop application. I don't know why but result is much more faster.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.