1

I need to run some numpy computation on 5000 files in parallel using python. I have the sequential single machine version implemented already. What would be the easiest way to run the code in parallel (say using an ec2 cluster)? Should I write my own task scheduler and job distribution code?

1 Answer 1

1

You can have a look pscheduler Python module. It will allow you to queue up your jobs and run them sequentially. The number of concurrent processes will depend upon the available CPU cores. This program can easily scale up and submit your jobs to remote machines but then would require all your remote machines to use NFS.

I'll be happy to help you further.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.