0

A computational scientist where I work wrote a program that scores inputs using a machine learning model built with scikit-learn. My task is to make this ML scorer available as a microservice.

So I wrote a few lines of code using Flask to accomplish this. Mission achieved!

Well, not quite. Because this service is going to be beaten on pretty heavily at times, it needs to be able to crunch on several requests in parallel. (I.e., on multiple cores. We have about 20 on our server.) A solution that I can achieve with about ten minutes of effort is to just spin up ten or twenty of these little REST servers on different ports and round-robin to them using nginx as a reverse proxy.

Although this will work fine, I am sure, I think it would be more elegant to have a single Python server handling all the requests, rather than having twenty Python servers. So I started reading up on WSGI and uWSGI and a bunch of other things. But all that I have accomplished with all this reading and web surfing is ending up very confused.

So I'll ask here instead of trying to unravel this on my own: Should I just stick with the brute force approach I described above? Or is there something better that I might be doing?

But if doing something "better" is going to require days of effort wading through incomprehensible documentation, doing frustrating experimentation, and pulling out all of my hair, then I'd rather just stick with the dumb brute force approach that I already understand and that I know for sure will work.

Thanks.

2
  • Maybe something like Cortex? Azure ML Deploy? MLflow? Commented Feb 13, 2020 at 19:49
  • @kichik It seems from the Cortex readme, that it's designed specifically for running services on AWS. Likewise Azure ML Deploy, only with Azure instead. And I'm not exactly sure about MLfow, but it also seems to be for cloud deployment. We don't want or need cloud deployment. We have our own server with 20 cores that will do the job just fine for much less money. Especially since we already own the machine. Commented Feb 13, 2020 at 20:32

2 Answers 2

4

I'd suggest migrating so FastAPI for this. It is significantly faster, really easy to use (especially if you're migrating from Flask), and is used by a lot of people for ML inference.

FastAPI uses the newer async functionality in python, which allows it to handle significantly more requests with the same amount of resources.

You can also use existing docker containers for either flask or fastapi rather than configuring yourself.

Sign up to request clarification or add additional context in comments.

5 Comments

FastAPI looks great for implementing REST servers. But that's very easy to do in Flask too. What I'm not seeing is any obvious way to make FastAPI parallel. I.e., make good use of 20 cores. Maybe if I were to combine it with Python's multiprocessing library, but I don't want to start up and tear down a process for each request. And managing some sort of process pool would be way too much effort. (Though I haven't ever used Python's multiprocessing library, so maybe it's easier to use than I am imagining.)
You're looking at the wrong layer- what you're asking isn't something that should be done on the python level, but one level up on the hosting side. Typically gunicorn/uvicorn takes care of your multiprocessing for you. If you use the tiangolo docker containers it will automatically use all the cores on the machine without you needing to make any code changes.
Thanks, but sigh! Okay, I'll give it a look. I love Python, but we do most of our work in Scala (a JVM language) and parallelizing a web service is dead simple in a language that doesn't have a GIL. You just run each request handler in its own thread, and you automatically get as much parallelization for CPU-bound requests as you have cores. All without having to jump through funky hoops or implementing a different "layer".
FastAPI plus uvicorn turned out to work great. Thanks!
Glad to hear it! If you wouldn't mind marking this as the accepted answer it might help others who end up in the same situation :-)
1

As suggested by tedivm, I used FastAPI and uvicorn to implement a working solution.

Here's a sample little server program which is named test_fast_api.py. It responds to both GET and POST requests (the POST requests should be in JSON) and the responses are in JSON:

from typing import List
from fastapi import FastAPI

app = FastAPI()

@app.get("/score/{seq}")
async def score(seq: str):
    return len(seq)

@app.get("/scores/{seqs}")
async def scores(seqs: str):
    return [len(seq) for seq in seqs.split(",")]

@app.post("/scores")
async def scores_post(seqs: List[str]):
    return [len(seq) for seq in seqs]

This service can then be served by 10 processes like so:

$ uvicorn --workers 10 --port 12345 test_fast_api:app

If this service were actually CPU-bound, running using 10 processes would allow it to make good use of 10 CPU cores.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.