I have some python3 code applied to a json file, with some neural networks and random forests in the codes. I put my codes into a Docker container, but noticed that these ML tasks run faster without Docker than with Docker. In Docker, I'm using Flask to load the json file and run the codes. Of course, I used identical versions of python modules locally and inside Docker and these are:
- theano 0.8.2
- keras 2.0.5
- scikit-learn 0.19.0
Also, Flask is
- 0.12
At first, I thought theano might use different resources with vs without Docker, but it's running both single CPU and single thread. It's also not using my GPU. I realized it's probably not theano when I realized my random forest is also running slower in Docker. Here are a bunch of tests I performed (I made several tests for each, I'm reporting the mean timings as these were stable)
Without Docker, without Flask:
- Task 1 (theano + keras code) : 1.0s
- Task 2 (theano + keras code) : 0.7s
- Task 3 (scikit-learn code) : 0.25s
Docker (cpus=1) + Flask (debug mode = True):
- T1: 6.5s
- T2: 2.2s
- T3: 0.58s
Docker (cpus=2) + Flask (debug mode = True):
- T1: 5.5s
- T2: 1.4s
- T3: 0.55s
Docker (cpus=2) + Flask (debug mode = False):
- T1: 4.5s
- T2: 1.2s
- T3: 0.5s
Docker (cpus=2) (No Flask, just calling the json file as done locally):
- T1: 2.8s
- T2: 1.1s
- T3: 0.5s
Flask (debug mode = True) (no Docker container):
- T1: 2.8s
- T2: 1.5s
- T3: 0.2s
I guess the cpu=1 vs cpu=2 is just allocating more of one cpu to the codes, and that the second cpu is just taking over some other work. Clearly, there is some reduction in time when Flask OR Docker are not being used, but still, I'm not able to reach the speed I can have without Docker AND without Flask. Does anyone have any guess of why this is happening?
This is a minimal chunck of code of how we use Flask to run the app
api = Flask(__name__)
pipeline = Pipeline() # private class calling multiple tasks
@api.route("/", methods=['POST'])
def entry():
data = request.get_json(force=True)
data = pipeline.process(data)
# This calls the different tasks which are timed
if __name__ == "__main__":
api.run(debug=True, host='0.0.0.0', threaded=False)
PS. Pardon me if the question is lacking anything, this is my 1st StackOverflow question