1

I have a process that I have built to work in parallel using multiprocessing.Process within Python 2.7; it should work really fast in theory on an EC2 cluster with a ton of vCPU, but it isn't scaling as I expected. I'm running the code on a 96 vCPU machine (an m5.24xlarge instance), but while the function being parallelized runs in ~45 minutes on a 4 vCPU machine when being run on its own when I try and run 90 in parallel it takes 5+ hours for all of the sub-processes to finish.

I've considered using the Pool function to get away from the batching that occurs, but the function being called runs ~200 models that can really run a long time (and sometimes get stuck in weird optimization loops) so I have an additional process running in the background that will start sending soft Ctrl+C commands to the sub-process once it has 3 hours of processor time behind it every 10 minutes to ensure processing for any individual subprocess does not go on too long.

Each vCPU running a sub-process ranges between 40 and 99% utilized until the subprocess completes. My question, why does multiprocessing not linearly scale when moving to a bigger instance? I keep 5 vCPU available to run any background processes, so it isn't being bogged down there.

from multiprocessing import Process
import datetime
import Prod_Modeling_Pipeline as PMP
import boto3
import pandas
import time
import numpy
import os

#Define locations
bucketName = 'bucketgoeshere'
output_location = '/home/ec2-user/'

#Pull ATM Setter Over
client = boto3.client('s3')
transfer = boto3.s3.transfer.S3Transfer(client=client)
transfer.download_file(bucketName,'Root_Folder/Control_Files/'+'execution_file.csv', output_location+'execution_file.csv')

#Read-in id list
execution_data = pandas.read_csv(output_location+'execution_file.csv')
ids = execution_data['id']

ni = 90
id_row = [['AAA']*ni for _ in xrange(int(numpy.ceil(len(tids)/float(ni))))]

for i in xrange(len(ids)):
    id_row[i/ni][i%ni] = ids[i]

Date = datetime.date.today().strftime('%Y-%m-%d')

totalstart = time.time()
for q in xrange(len(tid_row)):
    processes = []
    for m in xrange(len(tid_row[q])):
        temp = tid_row[q]
        try:
            p = Process(target=PMP.Model_Function, args=(temp[m],Date,'VALIDATION'))
            p.start()
            processes.append(p)
            time.sleep(1)
            print("Started "+temp[m]+" as "+str(os.getpid()))
        except:
            print("Invalid Run")

    for p in processes:
        p.join()

    print(processes)

print (time.time() - totalstart)

1 Answer 1

1

I think I understand now why it isn't linearly scaling. It all comes down to clock speed between a t2 EC2 instance and a m EC2 instance. Max clock speed is much higher for the smaller instances... up to 3.3 GHz for small t2s and 2.5 GHz for m type instances.

(https://aws.amazon.com/ec2/instance-types/)

That will limit scale-ability when you change to the larger instance type because you moved to a slower max clock speed.

This isn't all of my above problem, but it explains a portion of the time increase.

Another portion appears to be due using a shared processor so that even though the EC2 should take less time, someone else in my org is hogging processing power. Unsure how to fix that under corporate constraints.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.