How does BERT loss function works?

Question

I'm confused about how cross-entropy works in bert LM. To calculate loss function we need the truth labels of masks. But we don't have the vector representation of the truth labels and the predictions are vector representations. So how to calculate loss ?

This is not how BERT works, and you are asking in the wrong site, this is not a Machine Learning site. — Dr. Snoopy
– Dr. Snoopy, Commented Jun 16, 2022 at 7:05

prtkp · Accepted Answer · 2022-07-13 04:41:52Z

0

We already know the words we mask before passing to BERT so the actual word's one hot encoding is the actual truth label. The predicted token of masked word is passed to a softmax layer which converts the masked word's vector into another embedding (size will be similar to input word vector's size). Then we can calculate cross entropy loss between the input vector and the one we got after softmax layer. Hope this clarifies. For better clarification watch this https://www.youtube.com/watch?v=xI0HHN5XKDo

answered Jul 13, 2022 at 4:41

prtkp

5701 gold badge6 silver badges9 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

How does BERT loss function works?

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related