-1

I have a sample code here that uses a global variable, and its giving me errors. The global variable x is declared in test3 function before calling test2 function, but the test2 function doesn't appear to get the definition of the global variable x

from multiprocessing import Pool
import numpy as np

global x    

def test1(w, y):
    return w+y    

def test2(v):
    global x        # x is assigned value in test3 before test2 is called
    return test1(x, v)    

def test3():
    global x
    x = 2
    y = np.random.random(10)
    with Pool(processes=6) as p:
        z = p.map(test2, y)
    print(z)

if __name__ == '__main__':
    test3()

The error is:

multiprocessing.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
  File "C:\WinPython-64bit-3.5.2.1Qt5\python-3.5.2.amd64\lib\multiprocessing\pool.py", line 119, in worker
result = (True, func(*args, **kwds))
  File "C:\WinPython-64bit-3.5.2.1Qt5\python-3.5.2.amd64\lib\multiprocessing\pool.py", line 44, in mapstar
return list(map(*args))
  File "...\my_global_variable_testcode.py", line 23, in test2
return test1(x, v)
NameError: name 'x' is not defined
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "...\my_global_variable_testcode.py", line 35, in <module>
test3()
  File "...\my_global_variable_testcode.py", line 31, in test3
z = p.map(test2, y)
  File "C:\WinPython-64bit-3.5.2.1Qt5\python-3.5.2.amd64\lib\multiprocessing\pool.py", line 260, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
  File "C:\WinPython-64bit-3.5.2.1Qt5\python-3.5.2.amd64\lib\multiprocessing\pool.py", line 608, in get
raise self._value
NameError: name 'x' is not defined

I have looked at a lot of questions and answers on SO, but still haven't been able to figure out how to fix this code. Would be grateful if someone can point out what is the issue with the code?

Can anyone show me how to rewrite the code above, without changing the basic structure of code (i.e. retaining test1, test2, test3 as 3 separate functions, as in my original code these functions are quite long and complex) so that I can achieve my goal of multi-processing?

p.s. This sample code is just a simplified version of my actual code, and I am giving this simplified version here to figure out how to make global variables work (not trying to find a complicated way for 2+np.random.random(10)).

* EDIT * - BOUNTY DESCRIPTION

This bounty is for someone who can help me re-write this code, preserving the basic structure of functions in the code:

(i) test1 does the multi-processing call to test2, and test2 in turn calls test3

(ii) makes use of either global variables or the Manager class of multiprocessing module or anything else to avoid having test1 pass common variables to test2

(iii) test1 also gives some values or makes changes to the global variables / common data before calling the multiprocessing code

(iv) Code should work on Windows (as i am using Windows). Not looking for a solution that works on Linux / OSX at this time.

To help with the bounty, let me give two different test cases.

* case 1 - non-multiprocessing version *

import numpy as np

x = 3

def test1(w, y):
    return w+y

def test2(v):
    global x
    print('x in test2 = ', x)
    return test1(x, v)

def test3():
    global x
    x = 2
    print('x in test3 = ', x)
    y = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
    z = test2(y)
    print(z)

if __name__ == '__main__':
    test3()

The output (correct) is:

x in test3 =  2
x in test2 =  2
[ 3  4  5  6  7  8  9 10 11 12]

* case 2 - multi-processing version *

from multiprocessing import Pool
import numpy as np

x = 3

def test1(w, y):
    return w+y

def test2(v):
    global x
    print('x in test2 = ', x)
    return test1(x, v)

def test3():
    global x
    x = 2
    print('x in test3 = ', x)
    y = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
    with Pool(processes=6) as p:
        z = p.map(test2, y)
    print(z)

if __name__ == '__main__':
    test3()

The output (incorrect) is

x in test3 =  2
x in test2 =  3
x in test2 =  3
x in test2 =  3
x in test2 =  3
x in test2 =  3
x in test2 =  3
x in test2 =  3
x in test2 =  3
x in test2 =  3
x in test2 =  3
[4, 5, 6, 7, 8, 9, 10, 11, 12, 13]
10
  • 1
    You're using windows, that's why. Why not just pass x as an argument to the function using args? Commented Oct 11, 2017 at 1:30
  • @COLDSPEED, in my actual function, the size of x is very large, and the size of y is around 30,000 instead of just 10 in this dummy exercise. Since I am only calculating 6 of the 30,000 y's at a time using multi-processing, I trying to avoid passing something like zip(y, x_replicated_30k_times) to the map function. Why is windows an issue here? Commented Oct 11, 2017 at 1:34
  • Because, unix like OSes rely on fork + exec. However windows machines function differently. It's linked to the reason you need if __name__ == '__main__': to prevent infinite recursion. Commented Oct 11, 2017 at 1:36
  • I'm not sure what you mean. I have used if __name__ == '__main__' with both Ubuntu 14.04 and OSX Yoshemite, and used codes using the multiprocessing toolbox exactly the same way without making any chances Commented Oct 11, 2017 at 1:38
  • 1
    global doesn’t do anything at file scope (why does the parser even allow it?) or in your test2 that doesn’t assign to it. Commented Oct 13, 2017 at 20:10

2 Answers 2

2
+100

Your problem is that you are sharing a variable in Process and not in Multiprocess pool. When you use global x it can work in a individual process but not across multiple processes. In that case you need to use Value from multiprocessing. Below is an updated code which works in multiprocessing

from multiprocessing import Pool, Value
import numpy as np

xVal = Value('i', 0)

def test1(w, y):
    return w+y

def test2(v):
    x = xVal.value
    print('x in test2 = ', x)
    return test1(x, v)

def test3():
    xVal.value = 2

    print('x in test3 = ', xVal.value)
    y = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
    with Pool(processes=6) as p:
        z = p.map(test2, y)
    print(z)

if __name__ == '__main__':
    test3()

And output of the program is as below

x in test3 =  2
x in test2 =  2
x in test2 =  2
x in test2 =  2
x in test2 =  2
x in test2 =  2
x in test2 =  2
x in test2 =  2
x in test2 =  2
x in test2 =  2
x in test2 =  2
[3, 4, 5, 6, 7, 8, 9, 10, 11, 12]

Edit-2

Below program should work out on Windows also

from multiprocessing import Pool, Value, Manager, Array
import multiprocessing
import numpy as np

xVal = None

def sharedata(sharedData):
    global xVal
    xVal = sharedData

def test1(w, y):
    return w+y

def test2(v):
    global xVal
    x = xVal.value
    print('x in test2 = ', x)
    return test1(x, v)


def test3():
    xVal.value = 2
    print('x in test3 = ', xVal.value)
    y = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
    with Pool(processes=6, initializer=sharedata,initargs=(xVal,)) as p:
        z = p.map(test2, y)
    print('x in test3 = ', xVal.value)
    print(z)

if __name__ == '__main__':
    xVal = Value('i', 0)
    test3()
Sign up to request clarification or add additional context in comments.

3 Comments

Didn't work on my machine. Are you using Windows, or Linux / OSX? When I run your code (in Windows) I get x in test2 = 0.
unfortunately my Windows VM has no space left, so I had to test this on OSX only, hoping it works on Windows, will see if I can extend the space in Windows VM and test something over there
@uday, please check the latest answer
2

You have to define the variable x outside the functions, for instance instead of global x, say x = 0 or anything you like and use global declaration in functions just like how you're doing now. Hope that helps

2 Comments

That's exactly the solution and it works. I will award you the bounty as soon as SO allows me. I am not sure why someone downgraded the question - I couldn't find the fix in spite of searching for it many days (and wasting days on alternatives like COLDSPEED's suggestion to use Manager and similarly other solutions from other blogs). Many thanks for pointing out the fix.
Actually, it is not yet the fix. If I remove the global x outside the functions with x = 0 and run the code, test2 never gets x = 2 assignment within test3 and returns y assuming x=0. Can you post a complete code?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.