I have a GPT model
model = BioGptForCausalLM.from_pretrained("microsoft/biogpt").to(device)
When I send my batch to it I can get the logits and the hidden states:
out = model(batch["input_ids"].to(device), output_hidden_states=True, return_dict=True)
print(out.keys())
>>> odict_keys(['logits', 'past_key_values', 'hidden_states'])
The logits have shape of
torch.Size([2, 1024, 42386])
Corresponding to (batch, seq_length, vocab_length)
How can I get the vector embedding of the first (i.e., dim=0) token in the last layer (i.e., after the fully connected layer)? I believe it should be of size [2, 1024, 1024]
From here it seems like it should be under last_hidden_state, but I can't seem to generate it. out.hidden_states seems to be a tuple of length 25, where each is of dimension [2, 1024, 1024]. I'm wondering if the last one is the one I'm looking for, but I'm not sure.