4

I have the following code which I am trying to parallelize over multiple GPUs in PyTorch:

import numpy as np
import torch
from torch.multiprocessing import Pool

X = np.array([[1, 3, 2, 3], [2, 3, 5, 6], [1, 2, 3, 4]])
X = torch.DoubleTensor(X).cuda()

def X_power_func(j):
    X_power = X**j
    return X_power

if __name__ == '__main__':
  with Pool(processes = 2) as p:   # Parallelizing over 2 GPUs
    results = p.map(X_power_func, range(4))

results

But when I ran the code, I am getting this error:

---------------------------------------------------------------------------
RemoteTraceback                           Traceback (most recent call last)
RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/usr/lib/python3.6/multiprocessing/pool.py", line 119, in worker
    result = (True, func(*args, **kwds))
  File "/usr/lib/python3.6/multiprocessing/pool.py", line 44, in mapstar
    return list(map(*args))
  File "<ipython-input-35-6529ab6dac60>", line 11, in X_power_func
    X_power = X**j
RuntimeError: CUDA error: initialization error
"""

The above exception was the direct cause of the following exception:

RuntimeError                              Traceback (most recent call last)
<ipython-input-35-6529ab6dac60> in <module>()
     14 if __name__ == '__main__':
     15   with Pool(processes = 1) as p:
---> 16     results = p.map(X_power_func, range(8))
     17 
     18 results

1 frames
/usr/lib/python3.6/multiprocessing/pool.py in get(self, timeout)
    642             return self._value
    643         else:
--> 644             raise self._value
    645 
    646     def _set(self, i, obj):

RuntimeError: CUDA error: initialization error

Where have I gone wrong? Any help would really be appreciated.

1 Answer 1

4

I think the usual approach is to call model.share_memory() once before multiprocessing, assuming you have a model which subclasses nn.Module. For tensors, it should be X.share_memory_(). Unfortunately, I had trouble getting that to work with your code, it hangs (without errors) if X.share_memory_() is called before calling pool.map; I'm not sure if the reason is because X is a global variable which is not passed as one of the arguments in map.

What does work is this:

X = torch.DoubleTensor(X)

def X_power_func(j):
    X_power = X.cuda()**j
    return X_power

Btw: https://github.com/pytorch/pytorch/issues/15734 mentions that "CUDA API must not be initialized before you fork" (this is likely the issue you were seeing).

Also https://github.com/pytorch/pytorch/issues/17680 if using spawn in Jupyter notebooks "the spawn method will run everything in your notebook top-level" (likely the issue I was seeing when my code was hanging, in a notebook). In short, I couldn't get either fork or spawn to work, except using the sequence above (which doesn't use CUDA until it's in the forked process).

Sign up to request clarification or add additional context in comments.

3 Comments

Many thanks @Alex I. I tried calling X.share_memory_() after if __name__ == '__main__':but I keep getting the error AttributeError: 'Tensor' object has no attribute 'share_memory'. I can confirm using the sequence code you suggested above worked. The funny thing is that it only works when I run the code the first time. If I run it again, I get the error RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method.
I am using Google Colab and Kaggle Kernel and confirm with both, when I include set_start_method('spawn', force=True) after if __name__ == '__main__':, the code just hangs or keep running forever without any errors.
If you have any further insight on all these errors, please let me know. Many thanks once again.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.