0

I've written a small scraper that is meant to open up a connection to a PHP script on a remote server via HTTP and pump some XML it finds there into a local file.

Not exactly rocket science, I know.

The code below is the scraper in its entirety (cleaned up and anonymized).

This code works fine except for one small detail, it seems that no matter the size of the XML file (1 MB or 7 MB) the resulting XML file is always missing a small section at the end (600-800 characters).

Notes:

  • If I open the php page in Firefox - I get the whole doc no problem.

  • If I fire up wireshark and run the program below, I see the whole doc transferred across the wire, but not written down into the file.

using System;
using System.IO;
using System.Collections.Generic;
using System.Text;

namespace myNameSpace
{
    class Program
    {
        static void Main(string[] args)
        {
            Console.Write("BEGIN TRANSMISSION\n");
                writeXMLtoFile();
            Console.Write("END TRANSMISSION\n");
        }


        public static void writeXMLtoFile()
        {
            String url = "http://somevalidurl.com/dataPage.php?lotsofpars=true";
            TextWriter tw = new StreamWriter("xml\\myFile.xml");
            tw.Write(ScreenScrape(url));
            Console.Write(" ... DONE\n");
            tw.Close();
        }
        public static string ScreenScrape(string url)
        {
            System.Net.WebRequest request = System.Net.WebRequest.Create(url);
            using (System.Net.WebResponse response = request.GetResponse())
            {
                using (System.IO.StreamReader reader = new System.IO.StreamReader(response.GetResponseStream()))
                {
                    return reader.ReadToEnd();

                }
            }
        }
    }
}

Should I be using a different Writer? I've tried both TextWriter and StreamWriter to the same effect.

Kind regards from Iceland,

Gzur

3 Answers 3

1

Try:

XmlDocument doc = new XmlDocument();
doc.Load(url);
doc.Save(filename);

It really is that easy (with some error handling obviously). The .Net framework should do everything for you. I jumped through hoops a month or so ago trying to do the same thing and kicked myself when I read the help file on XmlDocument ;)

Sign up to request clarification or add additional context in comments.

2 Comments

facepalm You are a genius. Works like a bloody charm. Thanks a bundle. Thanks a lot
No Worries. Glad to save someone some time. The WebClient.DownloadFile is one I'll remember as well if I don't need to validate the XML at all.
1

Additionally, if really all you want to do is download a page to the file system, investigate the WebClient.DownloadFile method :)

Comments

0

It could be as simple as not calling Flush() on your StreamWriter, but why make life hard for yourself? Replace the whole writeXMLtoFile function with this:

public static void writeXMLtoFile()
{
    string url = "http://somevalidurl.com/dataPage.php?lotsofpars=true";
    string xml = ScreenScrape(url);
    File.WriteAllText("xml\\myFile.xml", xml);
}

This way, you can also use the debugger to see what's going on (inspect the xml variable).

1 Comment

Thanks for taking the time to reply :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.