Why does hugging face trainer still recognize different device between my encoder & classifier head even after I manually map it on the same device

I encounterd this error while trying to run hugging face trainer on a multi-gpu.

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1!

I use a T5 model, which then I extract the encoder only, sharding the encoder(separate into two device), wrap it with LoRA, and attach it with a classifier head.

This is the model code:

class ProtT5ForClassification(nn.Module):
    def __init__(self, encoder, device_map):
        super().__init__()
        self.encoder = encoder  # already sharded
        hidden = self.encoder.config.d_model

        # create classifier but don’t push it to a device yet
        self.classifier = nn.Linear(hidden, 1, bias=True).to(torch.float16)

        # dispatch classifier to follow the encoder device map
        # simplest: put it entirely on the last shard (cuda:1 here)
        dispatch_model(self.classifier, device_map={"": "cuda:1"})

        self.loss_fn = nn.BCEWithLogitsLoss()

    def masked_mean_pool(self, hidden_states, attention_mask):
        mask = attention_mask.unsqueeze(-1).type_as(hidden_states)
        summed = (hidden_states * mask).sum(dim=1)
        denom = mask.sum(dim=1).clamp(min=1e-9)
        return summed / denom

    def forward(self, input_ids=None, attention_mask=None, labels=None, **kwargs):
        # IMPORTANT: do not pass anything other than encoder-expected args to encoder
        enc_out = self.encoder(input_ids=input_ids, attention_mask=attention_mask, return_dict=True)
        last_hidden = enc_out.last_hidden_state
        pooled = self.masked_mean_pool(last_hidden, attention_mask)
        logits = self.classifier.to(pooled.device)(pooled).squeeze(-1)
        
        loss = None
        if labels is not None:
            labels = labels.float().view(-1)
            loss = self.loss_fn(logits, labels)

        return SequenceClassifierOutput(loss=loss, logits=logits)

I assume that the problem is the classifier head and the final layer of the encoder is not in the same device, so I tried to map the classifier head and the encoder last layer on the same device, but the error persist.

Could anyone figure out what's wrong?

Thanks

edited Sep 16 at 4:58

Ajeet Verma

4,5236 gold badges20 silver badges31 bronze badges

asked Sep 15 at 3:45

Dwi Rezky Fahlan

112 bronze badges

Add a comment |

0 Your Answer

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Collectives™ on Stack Overflow

Why does hugging face trainer still recognize different device between my encoder & classifier head even after I manually map it on the same device

0

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Sign up or log in

Post as a guest