1

This might be seen as a somewhat simple question, but I'm new to distributed training in Pytorch.

So, I have two defined models (net1 and net2), and two dataloaders for training and testing (train_dl, test_dl), and also I have access to two GPUs ("cuda:0" and "cuda:1"). I have a function called "training" that receives a model, the two dataloaders, and returns the test accuracy. I just need to train both models at the same time; not sequentially, but simultaneously:

1 net1.to("cuda:0")
2 net2.to("cuda:1")
3 accuracy1 = training(net1, train_dl, test_dl)
4 accuracy2 = training(net2, train_dl, test_dl)

What do I need to do in order to make lines 3 and 4 to execute at the same time? I read something about "spawn" but I have really no clue of how to implement it.

Thank you very much!

2
  • If the two models are to train independently, you can always just launch a separate python script for each of them? That being said, if you really want to do it in the same script, look into the multiprocessing library. Here's a relevant stackoverflow thread. Commented Nov 10, 2020 at 9:32
  • @Gustavo Vargas Hakim Did you solve your problem ? Commented Feb 5, 2021 at 9:47

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.