2

I found another question that asked for the same type of functionality, but the question is more than 2 years old so I was wondering if anybody has seen anything since then.

I've basically written my own asynchronous http/socket client using the standard .NET sockets. I maintain a pool of 1024 sockets and I have 128 "service" threads using the pool of sockets to download web pages from the internet at a rate of up to 371 pages per second (just tested it today on a single Amazon's EC2 server). I also made another asynchronous HTTP client which uses HttpWebRequest to asynchronously download web pages, but it's SIGNIFICANTLY slower: my throughput is on average about 50 pages per second (also tested on Amazon's EC2) using the same setup: 1024 pooled HttpWebRequests and 128 "service" threads.

Naturally, providing HTTP protocol support will take up some more processing power and memory. I'm hoping that with Amazon's Extra Large EC2 server I will not be restricted by the processing power/memory, but by the network bandwidth only (which has been the case so far).

An example of the the machine(s) that I'm using is Amazon's High-CPU Extra Large Instance:

  • 7 GB of memory
  • 20 EC2 Compute Units (8 virtual cores with 2.5 EC2 Compute Units each)
  • 1690 GB of instance storage
  • 64-bit platform
  • I/O Performance: High
  • API name: c1.xlarge

I can write my own HTTP processing which complies with the HTTP protocol, but it will save me a TON of work, pain and suffering if there is an off-the-shelf solution that is fast and robust.

I need the following functionality at the very minimum:

  • Build an HTTP HEAD/GET (and maybe POST) requests
  • Parsing of HTTP Response from binary stream
  • Supports cookies
  • LGP license (LGPL)

Does anybody know of any such solutions?

3
  • Did you try the performance of WebClient? Commented May 3, 2011 at 20:04
  • @jgauffin, I have not tried the WebClient yet. It seems like this may be a viable answer- please make sure you post it so I can accept it once I test it out. Commented May 3, 2011 at 20:07
  • I wrote I comment since I don't know if it's more performant. But I wrote it as an answer per your wish. ;) Commented May 3, 2011 at 20:09

1 Answer 1

3

I don't know how HttpWebRequest works with sockets internally. Open/Closing sockets might be a big performance hit. WebClient uses keep-alive and might work better.

Edit: I did a bit of googling and I wouldn't accept this as an answer. WebClient seems to be a wrapper around HttpWebRequest/Response: http://www.codeproject.com/Articles/156610/WP7-WebClient-vs-HttpWebRequest.aspx?msg=3775084

Update

Since you have started with sockets, I would stick with them. Feel free to take stuff from my webserver project: http://webserver.codeplex.com

My parser:

http://webserver.codeplex.com/SourceControl/changeset/view/56552#671689

Sign up to request clarification or add additional context in comments.

2 Comments

does the WebClient allow the client to connect to multiple different endpoints? I have thousands of end points and the async sockets works very well with them so far (even after closing the connection)... I still can't figure out how to reuse the sockets tho: stackoverflow.com/questions/5762276/…
I would stick to sockets and use a parser. Feel free to borrow ideas from the parser in my webserver project: webserver.codeplex.com. I wouldn't generate headers objects though, not very performant to do so.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.