How it to implement custom multiprocessing continuous (async) control in python?

Question

I have two browser types: firefox (40 instances) and chrome (40 instances) on my selenium grid. Also, I have a bunch of tests, some of them need to be executed on firefox, some of them under chrome. Some of them do not care.

First solution, that was advised by @skrrgwasme was to divide tests that do not need specific browsers on two groups, to have, finally two queues (one - to be executed on firefox, second to be executed on chrome): How to implement custom control over python multiprocessing.Pool?

This solution is nice, but still can be enhanced: as browser handle their requests with different speed, chrome will finish its queue much faster, and firefox queue will work much longer. This can be solved with custom continuous control, when I decide which browser to use not before creating pools, but when those pool are already started.

So, we should have one pool, where every process spawn is controlled by us.

@skrrgwasme advised to use apply_async for this. But I just can't understand how this can be achieved in python, as it is not async like node.js.

Could you please share some examples? I have very tiny experience with python, and seem to be totally stuck with this :(

Do you care about the return value of each worker in the pool? — dano
– dano, Commented Oct 23, 2014 at 19:44
@dano in fact no, i do not need to return anything. I just need to execute some code, when all tasks are finished. — avasin
– avasin, Commented Oct 23, 2014 at 19:46

dano · Accepted Answer · 2014-10-23 20:23:42Z

2

I think the easiest thing here is to have each worker process consume from two Queue objects. One is a browser specific Queue, and the other a shared "generic" Queue. That way, you can have 40 processes consume from a Chrome Queue, and then switch to the generic Queue once its drained, and also have 40 processes consume from a Firefox Queue, and then switch to the generic Queue once it's drained. Here's an example using 8 processes instead of 80:

from multiprocessing import Pool, Manager
from Queue import Empty
import time

ff_tests = [1,2,3,4,5]
chrome_tests = [10, 11, 12, 13, 14, 15]
general_tests = [20, 21,22, 23,24,25]

def process_func(spec_queue, general_queue, browser):
    while True:
        try:
            test = spec_queue.get_nowait()
            print("Processing {} in {} process".format(test, browser))
            time.sleep(2)
        except Empty:
            break

    while True:
        try:
            test = general_queue.get_nowait()
            print("Processing {} in {} process".format(test, browser))
            time.sleep(2)
        except Empty:
            break


if __name__ == "__main__":
    m = Manager()
    ff_queue = m.Queue()
    chrome_queue = m.Queue()
    general_queue = m.Queue()

    for queue, tests in [(ff_queue, ff_tests), (chrome_queue, chrome_tests),
                         (general_queue, general_tests)]:
        for test in tests:
            queue.put(test)


    pool = Pool(8)
    for _ in range(4):
        pool.apply_async(process_func, args=(ff_queue, general_queue, "firefox"))
        pool.apply_async(process_func, args=(chrome_queue, general_queue, "chrome"))
    pool.close()
    pool.join()

Output:

Processing 1 in firefox process
Processing 10 in chrome process
Processing 2 in firefox process
Processing 11 in chrome process
Processing 3 in firefox process
Processing 12 in chrome process
Processing 4 in firefox process
Processing 13 in chrome process
Processing 5 in firefox process
Processing 14 in chrome process
Processing 20 in firefox process
Processing 15 in chrome process
Processing 21 in firefox process
Processing 22 in chrome process
Processing 23 in firefox process
Processing 24 in chrome process
Processing 25 in chrome process

As you can see, the browser-specific queues get drained in their browser-specific processes, and then both types of processes work together to drain the generic queue.

edited Oct 23, 2014 at 20:23

answered Oct 23, 2014 at 20:14

dano

95.6k21 gold badges234 silver badges231 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

avasin Over a year ago

Seems it is the best way to do this. Just need some time to realize how it works under the hood... So, we start 2 queues 4 times (so we have 8 queues). first of all those queues solve browser-specific requests, and then switch to the generic. Fantastic! Thank you!

Henry Keiter Over a year ago

@avasin For clarity: there are only three total queues here. We start four processes for FireFox and four processes for Chrome. Each of the FF processes has a handle on the FF queue; each Chrome process has a handle on the Chrome queue, and all eight processes have handles on the General queue for when their primary queue runs out.

dano Over a year ago

@avasin Henry describes it well in his comment. 3 queues total - the general queue is passed to all 8 processes, and the browser-specific queues are passed to 4 processes each. The main process loads up the 3 queues with the appropriate tasks, and the 8 worker processes drain them.

Ricalsin Over a year ago

All of you are sic. (Thanks for the discussion.) I learned a lot.

Community · Accepted Answer · 2017-05-23 12:00:39Z

1

Using pool.apply_async can be thought of very much like manually setting up each of the calls made under the hood by the map in your previous question. You simply add all the tasks to the pool, and it blasts through them as new worker processes become available. You can use the same one-function-per-browser approach that skrrgwasme suggested. The code below borrows heavily from his answer to the previous question:

from multiprocessing import Pool

params = [1,2,3,4,5 ... ]

def ff_func(param):
    # Do FireFox stuff

def ch_func(param):
    # Do Chrome stuff

pool = Pool(80)

# For each parameter, add two tasks to the pool--one FF, one Chrome.
for param in params:
    pool.apply_async(ff_func, param)
    pool.apply_async(ch_func, param)

pool.close()
pool.join()

What happens here is that you build a big asynchronous queue of tasks for the pool to handle. The pool then handles all the defined tasks in whatever order it sees fit.

Note that unlike the previous answer, this doesn't actually guarantee a maximum pool size of 40 for each browser, because you've requested that we make better use of our resources. The best way to use our maximum of 80 processes is to have them all working all the time, to the extent possible.

If you can't use more than forty processes at a time for either of the two "types", then you can't really improve on the two-pools approach from before. Your bottleneck in that case is simply the speed at which forty processes can complete one or the other queue. The free processes from the faster queue can't be utilized if you're not allowed to utilize them ;-)

edited May 23, 2017 at 12:00

CommunityBot

11 silver badge

answered Oct 23, 2014 at 19:53

Henry Keiter

17.3k8 gold badges53 silver badges85 bronze badges

7 Comments

avasin Over a year ago

But could it be that i will have 20 firefox and 60 chromes? I suppose selenium will die if that happens :( I need to control the maximum of each browser type.

Henry Keiter Over a year ago

@avasin Yes, you could--but isn't that what you're asking for? If you're concerned about Chrome finishing its queue earlier than FireFox, then aren't you asking how to make use of the newly-available worker processes? If you need a hard limit on the simultaneous number of each kind of process, use the approach from the other answer with two pools.

avasin Over a year ago

previous answer has a lack, that some browsers can run very fast and be idle after everything is done, while others can work slowly and have a big queue. Idea is not to get rid of limits, but to use all browsers with these limits. This would possible, if i could catch event every time spawned process is finished, to run new one manually.

avasin Over a year ago

In this case, i would be able to understand that i have some browsers idle, to give them some work.

Henry Keiter Over a year ago

@avasin I don't understand what "additional work" you're trying to give them. If one queue finishes, what exactly do you expect those extra browsers to do, if you're not going to recycle the processes to help the other queue finish faster?

|

Collectives™ on Stack Overflow

How it to implement custom multiprocessing continuous (async) control in python?

2 Answers 2

4 Comments

7 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

4 Comments

7 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related