Do you know any elegant way to do inference on 2 python processes with 1 GPU tensorflow?
Suppose I have 2 processes, first one is classifying cats/dogs, 2nd one is classifying birds/planes, each process is running different tensorflow model and run on GPU. These 2 models will be given images from different cameras continuously. Usually, tensorflow will occupy all memory of the entire GPU. So when you start another process, it will crash saying OUT OF MEMORY or failed convolution CUDA or something along that line. Is there a tutorial/article/sample code that shows how to load 2 models in different processes and both run in parallel? This is very useful also in case you are running a model inference while you are doing some heavy graphics e.g. playing games. I also want to know how running the model affects the game.
I've tried using python Thread and it works but each model predicts 2 times slower (and you know that python thread is not utilizing multiple CPU cores). I want to use python Process but it's not working. If you have sample few lines of code that work I would appreciate that very much.
