Slurm parallelization using python?

Ask Question

Asked 9 months ago

Modified 9 months ago

Viewed 68 times

There is a job I want to run on slurm that needs parallelization because it is otherwise too slow. Even though the computations are independent of each other, they don't happen at the beginning of the job. A MWE of the job would be something like

class Foo:
    def __init(self, n):
        self.n = n
        self.nList = list(range(n)) # Bonus question: a more elegant way of doing this?

    def cubicRoot(self, x):
        return x**(1/3)
    
    def cubicRootParallel(self):
        from pathos.multiprocessing import ProcessingPool as Pool
        p = Pool(4)
        
        def _cubicRoot(x):
            return self.cubicRoot(x)
  
        self.cubicRootList = p.map(_cubicRoot, self.nList)

foo = Foo(10)

foo.cubicRootParallel
print(foo.cubicRootList) # Should be a list of [0, 1, ..., ]

However when I try to call this job using slurm with more than one node (since I am running python), it raises sbatch: Warning: can't run 1 processes on 4 nodes, setting nnodes to 1. All other examples of using slurm in this way do the parallelization at the beginning, or do the same calculation many times with different parameters. However, my code (apart from the cubic root calculation) is linear in the sense that the only time that parallelization makes sense (as far as my understanding goes) is when doing this cubic root calculation.

How should I do this?

Edit: Note: As I point out in a comment, the foo.cubicRoot function is only a placeholder. I am studying the solution of a wave equation and splitting it in n modes. Therefore the foo.cubicRootList is no longer an array of floats, but rather an array of arrays. n is actually of the order of 200.

edited Feb 25 at 15:01

asked Feb 21 at 11:18

dolefeast

1135 bronze badges

AFAIK, multiprocessing is limited to 1 computing node. You cannot use multiple nodes with it. MPI is the standard way to use supercomputer and scientific clusters. There is a Python implementation. MPI is based on the SPMD model which seems to match with your problem at first glance.

Jérôme Richard
– Jérôme Richard

2025-02-21 12:49:40 +00:00
Commented Feb 21 at 12:49
That being said, I wonder if multiple nodes are really needed for that. can't you find a way to compute that more efficiently using more efficient math-based methods? Besides, there might be a way to make this faster than with a pure-CPython code. Cubic root can be certainly optimized with SIMD operations (much faster) and possibly GPU that are more efficient for highly-parallel SIMD-friendly operations. For big numbers, GMP can help too. What kind of numbers are computed (i.e. small/big integer/FP numbers)?

Jérôme Richard
– Jérôme Richard

2025-02-21 13:00:40 +00:00
Commented Feb 21 at 13:00
I see. These calculations are for my Master's thesis that I should be very soon handing in, therefore I don't want to spend too much time into learning new things. Learning the multiprocessing library basics was quick. However, I need to redo calculations with much higher accuracy (i.e. the self.n parameter is much higher) and therefore it is taking too much time.

dolefeast
– dolefeast

2025-02-22 11:47:21 +00:00
Commented Feb 22 at 11:47
The calculations have nothing to do with calculating the cubic root of anything, it is just a placeholder. The calculations is solving n differential equations, and each element of the array is a solution to the differential equation with different parameters (i.e. an array).

dolefeast
– dolefeast

2025-02-22 11:48:44 +00:00
Commented Feb 22 at 11:48
Ok, thank you for the clarification. I understand. Then, I think MPI + Numpy should be good for your use case. Numba can make operations faster and it is pretty simple to use as long as your code is a set of loop nest. That being said, it might not mix well with MPI. MPI is simple to use as long as there is very few communication with a basic communication pattern. This is the case in your example (embarrassingly parallel execution). A bcast and scatter operation can broadcast/scatter data for 1 process to all others, and gather at the end to get results in the master process.

Jérôme Richard
– Jérôme Richard

2025-02-22 15:51:57 +00:00
Commented Feb 22 at 15:51

| Show 6 more comments

0 Your Answer

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Collectives™ on Stack Overflow

Slurm parallelization using python?

0

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Sign up or log in

Post as a guest