1

If I recive a web site with this function I get the whole page, but without the ajax loaded values.

htmlDoc.LoadHtml(new WebClient().DownloadString(url));

Is it possible to load the web site like in gChrome with all values?

2 Answers 2

3

You can use a WebBrowser control to get and render the page. Unfortunately, the control uses Internet Explorer and you have to change a registry value in order to force it to use the latest version and even then the implementation is very brittle.

Another option is to take a standalone browser engine like WebKit and make it work in .NET. I found a page explaining how to do this, but it's pretty dated: http://webkitdotnet.sourceforge.net/basics.php

I worked on a little demo app to get the content and this is what I came up with:

    class Program
    {
        static void Main(string[] args)
        {
            GetRenderedWebPage("https://siderite.dev", TimeSpan.FromSeconds(5), output =>
            {
                Console.Write(output);
                File.WriteAllText("output.txt", output);
            });
            Console.ReadKey();
        }

        private static void GetRenderedWebPage(string url, TimeSpan waitAfterPageLoad, Action<string> callBack)
        {
            const string cEndLine= "All output received";

            var sb = new StringBuilder();
            var p = new PhantomJS();
            p.OutputReceived += (sender, e) =>
            {
                if (e.Data==cEndLine)
                {
                    callBack(sb.ToString());
                } else
                {
                    sb.AppendLine(e.Data);
                }
            };
            p.RunScript(@"
var page = require('webpage').create();
page.viewportSize = { width: 1920, height: 1080 };
page.onLoadFinished = function(status) {
    if (status=='success') {
        setTimeout(function() {
            console.log(page.content);
            console.log('" + cEndLine + @"');
            phantom.exit();
        }," + waitAfterPageLoad.TotalMilliseconds + @");
    }
};
var url = '" + url + @"';
page.open(url);", new string[0]);
        }
    }

This uses the PhantomJS "headless" browser by way of the wrapper NReco.PhantomJS which you can get through "reference NuGet package" directly from Visual Studio. I am sure it can be done better, but this is what I did today. You might want to take a look at the PhantomJS callbacks so you can properly debug what is going on. My example will wait forever if the URL doesn't work, for example. Here is a useful link: https://newspaint.wordpress.com/2013/04/25/getting-to-the-bottom-of-why-a-phantomjs-page-load-fails/

Sign up to request clarification or add additional context in comments.

3 Comments

A browser engine look like a good idea, the default IE8? browser in c# is not the best choice for my project. Before I try out the WebKit engine, do you know if I can block every graphic from the web site. I need to load the web site as fast as I can.
As for the blocking, take a look at the onResourceRequested PhantomJS event. Maybe it has some sort of cancellation mechanism. However, consider that based on the size of pictures the page might render differently.
I've tested a lot of jQ webpages, it's works awesome. Thanks a lot for your code example and the PhantomJS advice.
2

No its not possible in your example. Since it will load content as a string. You should render that string in "browser engine" or find any components which would do that for you.

I would suggest you to look into abotx they just announce this feature so maybe would be interesting for you but its not free.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.