Limiting number of processes in multiprocessing python

Question

My requirement is to generate hundreds of HTTP POST requests per second. I am doing it using urllib2.

def send():
    req = urllib2.Request(url)
    req.add_data(data)
    response = urllib2.urlopen(req)

while datetime.datetime.now() <= ftime:
    p=Process(target=send, args=[])
    p.start()
    time.sleep(0.001)

The problem is this code sometimes for some iterations throws either of following exceptions:

HTTP 503 Service Unavailable.
URLError: <urlopen error [Errno -2] Name or service not known>

I have tried using requests(HTTP for humans) as well but I am having some proxy issues with that module. Seems like requests is sending http packets to proxy server even when target machine is within same LAN. I don't want packets to go to proxy server.

If the goal is to generate a ton of requests I wouldn't start those in individual processes like that. Very inefficient. Use something like twisted and an epoll based reactor: twistedmatrix.com/documents/current/web/howto/client.html — agrinh
– agrinh, Commented Apr 23, 2014 at 6:32
@agrinh Thanks for reply. you have any idea on how to use Twisted in loop, I could generate only single request using it. Can you please provide some code snippet. — Amit Sharma
– Amit Sharma, Commented Apr 23, 2014 at 11:27
Problem with multi threaded Python app and socket connections has code examples that show how make multiple connections using various frameworks e.g., asyncio, twisted, gevent. — jfs
– jfs, Commented Apr 25, 2014 at 4:29
You can use Pool workers for multiprocessing, check the "16.6.1.5. Using a pool of workers" title in docs.python.org/2/library/multiprocessing.html — furkanayd
– furkanayd, Commented Nov 29, 2019 at 8:40

jfs · Accepted Answer · 2014-04-25 04:39:38Z

The simplest way to limit number of concurrent connections is to use a thread pool:

#!/usr/bin/env python
from itertools import izip, repeat
from multiprocessing.dummy import Pool # use threads for I/O bound tasks
from urllib2 import urlopen

def fetch(url_data):
    try:
        return url_data[0], urlopen(*url_data).read(), None
    except EnvironmentError as e:
        return url_data[0], None, str(e)

if __name__=="__main__":
    pool = Pool(20) # use 20 concurrent connections
    params = izip(urls, repeat(data)) # use the same data for all urls
    for url, content, error in pool.imap_unorderred(fetch, params):
        if error is None:
           print("done: %s: %d" % (url, len(content)))
        else:
           print("error: %s: %s" % (url, error))

503 Service Unavailable is a server error. It might fail to handle the load.

Name or service not known is a dns error. If you need make many requests; install/enable a local caching dns server.

Collectives™ on Stack Overflow

Limiting number of processes in multiprocessing python

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related