There is a job I want to run on slurm that needs parallelization because it is otherwise too slow. Even though the computations are independent of each other, they don't happen at the beginning of the job. A MWE of the job would be something like
class Foo:
def __init(self, n):
self.n = n
self.nList = list(range(n)) # Bonus question: a more elegant way of doing this?
def cubicRoot(self, x):
return x**(1/3)
def cubicRootParallel(self):
from pathos.multiprocessing import ProcessingPool as Pool
p = Pool(4)
def _cubicRoot(x):
return self.cubicRoot(x)
self.cubicRootList = p.map(_cubicRoot, self.nList)
foo = Foo(10)
foo.cubicRootParallel
print(foo.cubicRootList) # Should be a list of [0, 1, ..., ]
However when I try to call this job using slurm with more than one node (since I am running python), it raises
sbatch: Warning: can't run 1 processes on 4 nodes, setting nnodes to 1.
All other examples of using slurm in this way do the parallelization at the beginning, or do the same calculation many times with different parameters. However, my code (apart from the cubic root calculation) is linear in the sense that the only time that parallelization makes sense (as far as my understanding goes) is when doing this cubic root calculation.
How should I do this?
Edit: Note: As I point out in a comment, the foo.cubicRoot function is only a placeholder. I am studying the solution of a wave equation and splitting it in n modes. Therefore the foo.cubicRootList is no longer an array of floats, but rather an array of arrays. n is actually of the order of 200.
bcastandscatteroperation can broadcast/scatter data for 1 process to all others, andgatherat the end to get results in the master process.