10

I trying to use the multiprocessing package in python with a Pool.

I have the function f which is called by the map_async function:

from multiprocessing import Pool

def f(host, x):
    print host
    print x

hosts = ['1.1.1.1', '2.2.2.2']
pool = Pool(processes=5)
pool.map_async(f,hosts,"test")
pool.close()
pool.join()

This code has the next error:

Traceback (most recent call last):
  File "pool-test.py", line 9, in <module>
    pool.map_async(f,hosts,"test")
  File "/usr/lib/python2.7/multiprocessing/pool.py", line 290, in map_async
    result = MapResult(self._cache, chunksize, len(iterable), callback)
  File "/usr/lib/python2.7/multiprocessing/pool.py", line 557, in __init__
    self._number_left = length//chunksize + bool(length % chunksize)
TypeError: unsupported operand type(s) for //: 'int' and 'str'

I don't know how to pass more than 1 argument to the f function. Are there any way?

1
  • You just can use pool.map and drop the "test" dummy variable altogether. Commented Jul 13, 2017 at 14:57

3 Answers 3

13

"test" is interpreted as map_async's chunksize keyword argument (see the docs).

Your code should probably be (here copy-pasted from my IPython session) :

from multiprocessing import Pool

def f(arg):
    host, x = arg
    print host
    print x

hosts = ['1.1.1.1', '2.2.2.2']
args = ((host, "test") for host in hosts)
pool = Pool(processes=5)
pool.map_async(f, args)
pool.close()
pool.join()
## -- End pasted text --

1.1.1.1
test
2.2.2.2
test

Note: In Python 3 you can use starmap, which will unpack the arguments from the tuples. You'll be able to avoid doing host, x = arg explicitely.

Sign up to request clarification or add additional context in comments.

9 Comments

I tested it but the result is not good; it print both hosts but only the "t" and the "e" of the "test" word.
Weird. It definitely does not do that on my computer. See update for my results -- I copy-pasted and checked them again.
With x=["test","test"] it works but it has no sense because imagine that the hosts list is about 10000 and I only want one x for compare the results. It not viable to have an x list with 10000 entries with the same result. Anyway, thanks.
That was just a quick'n'dirty way of doing it in your simple example. Maybe see the updated version?
Finally I used a global option. Really is a static variable.
|
6

Pool returns a context manager in Python 3 and so a with statement can be used. This avoids problems with exceptions and means no necessity to close and join. In this case the function is always receiving a constant for the x variable and so this can be handled with a partial evaluation. map_async is lazy and so we need to get the result for the actions to happen, might as well just use map. Thus:

from multiprocessing import Pool
from functools import partial

def f(host, x):
    print(host)
    print(x)

hosts = ('1.1.1.1', '2.2.2.2')
with Pool(processes=5) as pool:
    pool.map(partial(f, x='test'), hosts)

results in:

1.1.1.1
test
2.2.2.2
test

Comments

1

as I recall, the Pool().map() and .map_async() specifically accept only a single argument. this limitation can be worked around by passing a list, but of course then you need a customized function designed to take a list(like) object as an argument.

one approach is to write the custom code once -- aka, a general "function + args" wrapper. i worked up something like this (note: this is only partially tested):

def tmp_test():
    # a short test script:
    #
    A=[[1,2], [2,3], [4,5], [6,7]]
    P=mpp.Pool(mpp.cpu_count())
    X=P.map_async(map_helper, [[operator.eq]+a for a in A])
    #
    return X.get()


def null_funct(args=[], kwargs={}):
    # a place-holder 
    pass
#
def map_helper(args_in = [null_funct, [], {}]):
    # helper function for pool.map_async(). pass data as a list(-like object):
    # [function, [args], {kwargs}] (though we'll allow for some mistakes).
    #
    funct = args_in[0]
    #
    # allow for different formatting options:
    if not (isinstance(args_in[1], list) or isinstance(args_in[1], tuple) or isinstance(args_in[1], dict)):
        # probably passed a list of parameters. just use them:
        args = args_in[1:]
        #
        return funct(*args)
    #
    # if the args are "properly" formatted:
    args=[]
    kwargs = {}
    for arg in args_in[1:]:
        # assign list types to args, dict types to kwargs...
        if isinstance(arg, list) or isinstance(arg, tuple): args += arg
        if isinstance(arg, dict): kwargs.update(arg)
    return funct(*args, **kwargs)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.