0

I'm confused about how cross-entropy works in bert LM. To calculate loss function we need the truth labels of masks. But we don't have the vector representation of the truth labels and the predictions are vector representations. So how to calculate loss ?

1
  • This is not how BERT works, and you are asking in the wrong site, this is not a Machine Learning site. Commented Jun 16, 2022 at 7:05

1 Answer 1

0

We already know the words we mask before passing to BERT so the actual word's one hot encoding is the actual truth label. The predicted token of masked word is passed to a softmax layer which converts the masked word's vector into another embedding (size will be similar to input word vector's size). Then we can calculate cross entropy loss between the input vector and the one we got after softmax layer. Hope this clarifies. For better clarification watch this https://www.youtube.com/watch?v=xI0HHN5XKDo

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.