I need to train a very large number of Neural Nets using Tensorflow with Python. My neural nets (MLP) are ranging from very small ones (~ 2 Hidden Layers with ~30 Neurons each) to large ones (3-4 Layers with >500 neurons each).
I am able to run all of them sequencially on my GPU, which is fine. But my CPU is almost idling. Additionally I found out, that my CPU is quicker than the GPU for my very small nets (I assume because of the GPU-Overhead etc...). Thats why I want to use both my CPU and my GPU in parallel to train my nets. The CPU should process the smaller networks to the larger ones, and my GPU should process from the larger to the smaller ones, until they meet somewhere in the middle... I thought, this is a good idea :-)
So I just simply start my consumers twice in different processes. The one with device = CPU, the other one with device = GPU. Both are starting and consuming the first 2 nets as expected. But then, the GPU-consumer throws an Exception, that his tensor is accessed/violated by another process on the CPU(!), which I find weird, because it is supposed to run on the GPU...
Can anybody help me, to fully segregate my to processes?