2

My requirement is to generate hundreds of HTTP POST requests per second. I am doing it using urllib2.

def send():
    req = urllib2.Request(url)
    req.add_data(data)
    response = urllib2.urlopen(req)

while datetime.datetime.now() <= ftime:
    p=Process(target=send, args=[])
    p.start()
    time.sleep(0.001)

The problem is this code sometimes for some iterations throws either of following exceptions:

HTTP 503 Service Unavailable.
URLError: <urlopen error [Errno -2] Name or service not known>

I have tried using requests(HTTP for humans) as well but I am having some proxy issues with that module. Seems like requests is sending http packets to proxy server even when target machine is within same LAN. I don't want packets to go to proxy server.

4
  • If the goal is to generate a ton of requests I wouldn't start those in individual processes like that. Very inefficient. Use something like twisted and an epoll based reactor: twistedmatrix.com/documents/current/web/howto/client.html Commented Apr 23, 2014 at 6:32
  • @agrinh Thanks for reply. you have any idea on how to use Twisted in loop, I could generate only single request using it. Can you please provide some code snippet. Commented Apr 23, 2014 at 11:27
  • 1
    Problem with multi threaded Python app and socket connections has code examples that show how make multiple connections using various frameworks e.g., asyncio, twisted, gevent. Commented Apr 25, 2014 at 4:29
  • You can use Pool workers for multiprocessing, check the "16.6.1.5. Using a pool of workers" title in docs.python.org/2/library/multiprocessing.html Commented Nov 29, 2019 at 8:40

1 Answer 1

1

The simplest way to limit number of concurrent connections is to use a thread pool:

#!/usr/bin/env python
from itertools import izip, repeat
from multiprocessing.dummy import Pool # use threads for I/O bound tasks
from urllib2 import urlopen

def fetch(url_data):
    try:
        return url_data[0], urlopen(*url_data).read(), None
    except EnvironmentError as e:
        return url_data[0], None, str(e)

if __name__=="__main__":
    pool = Pool(20) # use 20 concurrent connections
    params = izip(urls, repeat(data)) # use the same data for all urls
    for url, content, error in pool.imap_unorderred(fetch, params):
        if error is None:
           print("done: %s: %d" % (url, len(content)))
        else:
           print("error: %s: %s" % (url, error))

503 Service Unavailable is a server error. It might fail to handle the load.

Name or service not known is a dns error. If you need make many requests; install/enable a local caching dns server.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.