1

Usually i use following code, and it works fine when you do not matter in which order function process_func will handle some parameter:

params = [1,2,3,4,5 ... ]

def process_func():
    ...

pool = new Pool(40)
pool.map(process_func, params)
pool.close()
pool.join()

In example above we have processes of one type, with maximum simultanious number of 40. But.. imagine we have processes (parameters) of different type, which should be executed simultaniously. For example, in my selenium grid i have 40 firefox, 40 chromes. And i have 5000 test cases, some of them prefer chrome, some of them firefox, some of them does not matter.

For example, lets say we have following types:

  • type firefox: maximum simultanious number: 40
  • type chrome: maximum simultanious number: 40

In this case our pool will have maximum of 80 simultanious processes, but there is strict rule: 40 of them must be firefox, 40 of them must be chromes.

It means that params won't be taken one after each other. Pool must select value from params list in a way to have maximum of each process type.

How it is possible to achieve that?

1
  • 1
    Is there a reason not to simply use two pools and two lists of inputs? Commented Oct 23, 2014 at 18:22

1 Answer 1

2

I would modify your process_func() to take one more parameter that tells it which "type" to be and use two separate pools. Adding functools.partial will allow us to still use pool.map():

from functools import partial
from multiprocessing import Pool

params = [1,2,3,4,5 ... ]

def process_func(type, param):
    if type == 'Firefox':
        # do Firefox stuff
    else:
        # do Chrome stuff

chrome_pool = Pool(40)
fox_pool = Pool(40)

chrome_function = partial(process_func, 'Chrome')
fox_function = partial(process_func, 'Firefox')

chrome_pool.map(chrome_func, params)
fox_pool.map(fox_func, params)

chrome_pool.close()
fox_pool.close()
chrome_pool.join()
fox_pool.join()

The functools.partial() function allows us to bind an argument to a specific value, by returning a new function object that will always supply that argument. This approach allows you to limit each "type" (for lack of a better term) to 40 worker processes.

Sign up to request clarification or add additional context in comments.

8 Comments

Hm, wow, and those two pools will work simultaniuosly in one master process?
Hm... and how i can control tests, that does not have preferred type (browser)? In fact, they should use less loaded browser type.
I'm not sure what you mean by "one master process", but this is almost the same as calling Pool(80). You're just enforcing a limit on the number of workers available to each function that is using a specific browser.
When you use map, there isn't much variance in the "loading" of a pool. Your indicated number of processes are spun off and given a queue. All of the jobs are submitted to that queue all at once. As each process finishes one job, it grabs the next from the queue. There isn't really any down time until the number of items left in the queue is less than the number of workers. To have the control you're looking for, you would have to use something like apply_async() and implement your own manual control over when jobs are submitted, and find a way to adjust it dynamically.
Also, even though you can launch 20 pools at a time, it doesn't mean you should. The "need" for that many pools suggest (to me) that the overall structure of your script may need some work. Once you have your code working, consider heading over to Code Review to get some other people's input on it.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.