This might be seen as a somewhat simple question, but I'm new to distributed training in Pytorch.
So, I have two defined models (net1 and net2), and two dataloaders for training and testing (train_dl, test_dl), and also I have access to two GPUs ("cuda:0" and "cuda:1"). I have a function called "training" that receives a model, the two dataloaders, and returns the test accuracy. I just need to train both models at the same time; not sequentially, but simultaneously:
1 net1.to("cuda:0")
2 net2.to("cuda:1")
3 accuracy1 = training(net1, train_dl, test_dl)
4 accuracy2 = training(net2, train_dl, test_dl)
What do I need to do in order to make lines 3 and 4 to execute at the same time? I read something about "spawn" but I have really no clue of how to implement it.
Thank you very much!
multiprocessinglibrary. Here's a relevant stackoverflow thread.