PyTorch: How to parallelize over multiple GPU using multiprocessing.pool

Question

I have the following code which I am trying to parallelize over multiple GPUs in PyTorch:

import numpy as np
import torch
from torch.multiprocessing import Pool

X = np.array([[1, 3, 2, 3], [2, 3, 5, 6], [1, 2, 3, 4]])
X = torch.DoubleTensor(X).cuda()

def X_power_func(j):
    X_power = X**j
    return X_power

if __name__ == '__main__':
  with Pool(processes = 2) as p:   # Parallelizing over 2 GPUs
    results = p.map(X_power_func, range(4))

results

But when I ran the code, I am getting this error:

---------------------------------------------------------------------------
RemoteTraceback                           Traceback (most recent call last)
RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/usr/lib/python3.6/multiprocessing/pool.py", line 119, in worker
    result = (True, func(*args, **kwds))
  File "/usr/lib/python3.6/multiprocessing/pool.py", line 44, in mapstar
    return list(map(*args))
  File "<ipython-input-35-6529ab6dac60>", line 11, in X_power_func
    X_power = X**j
RuntimeError: CUDA error: initialization error
"""

The above exception was the direct cause of the following exception:

RuntimeError                              Traceback (most recent call last)
<ipython-input-35-6529ab6dac60> in <module>()
     14 if __name__ == '__main__':
     15   with Pool(processes = 1) as p:
---> 16     results = p.map(X_power_func, range(8))
     17 
     18 results

1 frames
/usr/lib/python3.6/multiprocessing/pool.py in get(self, timeout)
    642             return self._value
    643         else:
--> 644             raise self._value
    645 
    646     def _set(self, i, obj):

RuntimeError: CUDA error: initialization error

Where have I gone wrong? Any help would really be appreciated.

Alex I · Accepted Answer · 2020-07-25 03:19:53Z

4

I think the usual approach is to call model.share_memory() once before multiprocessing, assuming you have a model which subclasses nn.Module. For tensors, it should be X.share_memory_(). Unfortunately, I had trouble getting that to work with your code, it hangs (without errors) if X.share_memory_() is called before calling pool.map; I'm not sure if the reason is because X is a global variable which is not passed as one of the arguments in map.

What does work is this:

X = torch.DoubleTensor(X)

def X_power_func(j):
    X_power = X.cuda()**j
    return X_power

Btw: https://github.com/pytorch/pytorch/issues/15734 mentions that "CUDA API must not be initialized before you fork" (this is likely the issue you were seeing).

Also https://github.com/pytorch/pytorch/issues/17680 if using spawn in Jupyter notebooks "the spawn method will run everything in your notebook top-level" (likely the issue I was seeing when my code was hanging, in a notebook). In short, I couldn't get either fork or spawn to work, except using the sequence above (which doesn't use CUDA until it's in the forked process).

edited Jul 25, 2020 at 3:19

answered Jul 25, 2020 at 3:01

Alex I

20.4k12 gold badges100 silver badges166 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Leockl Over a year ago

Many thanks @Alex I. I tried calling X.share_memory_() after if __name__ == '__main__':but I keep getting the error AttributeError: 'Tensor' object has no attribute 'share_memory'. I can confirm using the sequence code you suggested above worked. The funny thing is that it only works when I run the code the first time. If I run it again, I get the error

RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method

.

Leockl Over a year ago

I am using Google Colab and Kaggle Kernel and confirm with both, when I include set_start_method('spawn', force=True) after if __name__ == '__main__':, the code just hangs or keep running forever without any errors.

Leockl Over a year ago

If you have any further insight on all these errors, please let me know. Many thanks once again.

Collectives™ on Stack Overflow

PyTorch: How to parallelize over multiple GPU using multiprocessing.pool

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related