1

I want to facilitate URI-based interaction with Python programs on the login node of a cfncluster. The programs will perform various time-intensive (minutes to hours) operations, mostly linked through S3 objects produced and consumed.

I have prototyped smaller scale scenarios using Lambda functions and API GW to provide asynchronous endpoints where the client sends JSON to the URI endpoint, which sends it to the Lambda function, the Lambda function comes up with a unique S3 object name and immediately returns that (in JSON) to the client, then goes to work producing stuff in that S3 object (within the time and space limitations of Lambda functions). The client polls until the object is available, often calling another Lambda function to consume that S3 object.

Now, I want to facilitate the same thing where the work being done is huge (large scale atmospheric models), which will require the launching of a cfncluster (for parallel computing) with lots of big software packages. The best approach I can come up with so far is to launch Lambda functions via API-GW, then have them start asynchronous processes (launching, status-checking, killing) on the cfncluster. Although I could have the Lambda functions interact with the cfncluster processes via ssh, I would prefer to avoid that. These processes will generally come from running Python code.

I've read of the AWS "EC2 Run Command" for performing admin tasks, and my vague understanding is that

  • I can run this "EC2 Run Command" from a Lambda function, meaning I could pass JSON through API GW to the Lambda function and have that create the "run" command on the EC2 instance.
  • The "EC2 Run Command" will write stdout to an S3 object, so "maybe" I can set up my EC2 processes to write out the JSON I would want in a response, and have other Lambda functions grab those in status requests.

This, of course, doesn't seem too easy, but it seems do-able, and to the best of my knowledge just might represent the "state of the art" for accomplishing web-based control of large-scale scientific models in the cloud. Am I correct in thinking that this is probably "state of the art" for what I want to do? AWS constantly adds new services. Have I missed something?

I have also contemplated (and prototyped on a small scale) the use of a CherryPy server on the EC2 instance, but that has its own complexities and drawbacks.

1 Answer 1

1

I would have the Lambda functions insert the task into an SQS queue. Have the EC2 instances poll the queue for tasks to complete. You could write the task status to a DynamoDB table, so anything polling to check if the operation is complete would have a place to look. This would keep your Lambda functions decoupled from the back-end EC2 instances performing the long-running tasks, while also giving you retry capabilities, and the possibility of scaling your EC2 instance pool based on the queue depth.

You could also look into using AWS Step Functions to orchestrate all of this.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.