How to parallelize inference with tensorflow serving on multiple GPUs?

Question

I am using Tensorflow-serving in deployment of my tensorflow models. I have multiple GPU's on the servers available, but, as of now during inference, only one GPU is utilized.

My idea for now, to parallelize classification of large number of images, is to spawn a tensorflow-serving image for each GPU available and have parallel "workers" which grab an image from a generator, make a request and wait for answer. Then grabs a new image from the generator and so on. This would mean that I would have to implement my own datahandler, but it seems achievable.

I read something about SharedBatchScheduler in TensorFlow Serving Batching, but I do not know if this would be useful check out more.

I am fairly new to tensorflow-serving in general and I am wondering if this is the most straightforward way to accomplish what I want.

Thanks in advance for any help/suggestions!

Edit: Thanks for clarification question: I am aware of the 311 issue, github.com/tensorflow/serving/issues/311. Do anyone have a workaround for this issue?

This thread must be useful for you: github.com/tensorflow/serving/issues/311 — Dmytro Prylipko
– Dmytro Prylipko, Commented Jan 28, 2019 at 15:33
Yes, I have read this issue, and my idea to work around this was inspired from the answer of echan00 here: github.com/tensorflow/serving/issues/779#issuecomment-454047520. My rephrashed question would be if anyone have encountered the 311 issue (which @DmytroPrylipko links to) and found a way around. — Mathias Stensrud
– Mathias Stensrud, Commented Jan 28, 2019 at 15:41

clstl · Accepted Answer · 2019-01-31 15:38:36Z

1

It is totally doable with docker and nvidia-docker 2.0 (judging from docker run --runtime=nvidia ... from the issue, they are using the first version). I did try it with multiple GPUs and Serving; however, didn't end up running it on multiple GPUs.

Nevertheless, I have a host with 4 GPUs, and currently scheduling 1 GPU per custom image that has Tensorflow running for training, so that each user can use a GPU in a isolated environment. Previously I was using Kubernetes for device provisioning, and container management, but it was just an overkill for what I needed to do. Currently, I am using docker-compose to do all the magic. Here is an example:

version: '3'
services:
    lab:
        build: ./tensorlab
        image: centroida/tensorlab:v1.1
        ports:
            - "30166:8888"
            - "30167:6006"
        environment:
            NVIDIA_VISIBLE_DEVICES: 0,1,2
       ...

The key part here is the NVIDIA_VISIBLE_DEVICES variable, where the index of a GPU corresponds to the output of nvidia-smi

answered Jan 31, 2019 at 15:38

clstl

3521 gold badge2 silver badges12 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Mathias Stensrud Over a year ago

Alright! Thanks for your answer. Have you heard of Nvidia TensorRT Inference Server (developer.nvidia.com/tensorrt)? I just stumbled upon it, and they write "TensorRT Inference Server: Maximizes utilization by enabling inference for multiple models on one or more GPUs". It seemed promising with a multigpu situation.

clstl Over a year ago

You are welcome, hopefully that helps. And yes, I've heard and used TensorRT extensively. Note thought that the product is "quite" raw if your models are not linear and have branches, or use some new stuff from tensroflow. Currently, they don't support all of the operations, but I see that they add more and more stuff with each release. It is generally hard to setup, and makes more sense if you are using architectures later than Pascal with good float16 and int8 support. Ultimately, it does give you a good boost, 2-3 times in my experience (despite half of the model was not converted).

Collectives™ on Stack Overflow

How to parallelize inference with tensorflow serving on multiple GPUs?

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related