0

In the below code, I get an error about "can't get attribute 'f' on module main". I know how to fix it: bring the pool line and the result line both to just above result 2.

My question is why the code in its current form doesn't work. I am working with more complicated code where I have to use parallel processing inside of two different separate for loops. Right now, I have in each iteration of each for loop, pool=mp.Pool(3). I read online that this is bad because in each iteration, I am creating more Pool "workers." How can I put pool = mp.Pool(3) on the outside of the iteration and then use the same Pool workers in all of the different areas of my code that I need to?

For the record, I am using a mac to run my code.

import numpy as np
import multiprocessing as mp

x = np.array([1,2,3,4,5,6])

pool = mp.Pool(3)

def f(x):
    return x**2

result = pool.map(f,x)

def g(x):
    return x + 1

result2 = pool.map(g,x)
print('result=',result,'and result2=',result2)
3
  • Creating the Pool creates the necessary subprocesses by forking (on Mac OS) at this point. This means the forked children haven't yet executed the creation of "f" but instead wait for tasks from main process. Commented Oct 1, 2019 at 23:02
  • @MichaelButscher I am truly confused. So are you saying there is no way to do what I want to do above, where I define pool once and then can use pool anywhere subsequent in my code? Right now, I am defining pool in a for loop, so in each iteration, pool = mp.Pool(3) is run... Commented Oct 2, 2019 at 1:23
  • never ever define a pool in a loop because it keeps spanwing pools and pool-workers in uncontrolled manner and thus exhausting your RAM memory leaving nothing for other programs and eventually result in a computer crash. After you created a pool (def main from Michael Butsch's answer) you can use a while-loop for daemon-like activities and re-using pool members. Commented Feb 10, 2020 at 10:50

1 Answer 1

1

When using "fork" method for creating subprocesses (default for Mac OS) the processes are forked (basically copied) when the Pool is created. This means in your code the forked children haven't yet executed the creation of f but instead wait for tasks from main process.

First of all you should not execute "active" code (other than defining functions, classes, constants) directly in the script but move it to functions. Your code can then look like:

import numpy as np
import multiprocessing as mp


def f(x):
    return x**2

def g(x):
    return x + 1

def main():
    x = np.array([1,2,3,4,5,6])

    pool = mp.Pool(3)

    result = pool.map(f,x)
    result2 = pool.map(g,x)
    print('result=',result,'and result2=',result2)

# Should be nearly the only "active" statement
main()

Or maybe better in your case, I guess:

import numpy as np
import multiprocessing as mp


def f(x):
    return x**2

def g(x):
    return x + 1

def proc_f():
    global x, pool
    return pool.map(f,x)

def proc_g():
    global x, pool
    return pool.map(g,x)

def main():
    global x, pool
    x = np.array([1,2,3,4,5,6])

    pool = mp.Pool(3)

    result = proc_f()
    result2 = proc_g()
    print('result=',result,'and result2=',result2)

# Should be nearly the only "active" statement
main()
Sign up to request clarification or add additional context in comments.

1 Comment

Thank you. It's been a few days since you posted your answer, but I have been able to rework my code. From your examples, I understand now that any functions I want the pool workers to use must be defined before the line mp.Pool(). I have restructured my code so that I am only calling the Pool() function once.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.