0

I want to train my model with 2 GPU(id 5, 6), so I run my code with CUDA_VISIBLE_DEVICES=5,6 train.py. However, when I printed torch.cuda.current_device I still got the id 0 rather than 5,6. But torch.cuda.device_count is 2, which semms right. How can I use GPU5,6 correctly?

2 Answers 2

1

It is most likely correct. PyTorch only sees two GPUs (therefore indexed 0 and 1) which are actually your GPU 5 and 6.

Check the actual usage with nvidia-smi. If it is still inconsistent, you might need to set an environment variable:

export CUDA_DEVICE_ORDER=PCI_BUS_ID

(See Inconsistency of IDs between 'nvidia-smi -L' and cuDeviceGetName())

Sign up to request clarification or add additional context in comments.

Comments

0

you can check the device name to verify whether that is the correct name of that GPU. However, I think when you set the Cuda_Visible outside, you have forced torch to look only at that 2 gpu. So torch will manually set index for them as 0 and 1. Because of this, when you check the current_device, it will output 0

3 Comments

Can I set both of them to be the current device? That is because I want to use both of them.
yes, of course. You can use this example as source pytorch.org/tutorials/beginner/former_torchies/….
because, multiple gpu only use for parallel processing. If you dont declare your model for multiple gpu use, it will automatically set for a single GPU with index 0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.