0

I know what embeddings are and how they are trained. Precisely, while referring to the tensorflow's documentation, I came across two different articles. I wish to know what exactly is the difference between them.

link 1: Tensorflow | Vector Representations of words

In the first tutorial, they have explicitly trained embeddings on a specific dataset. There is a distinct session run to train those embeddings. I can then later on save the learnt embeddings as a numpy object and use the

tf.nn.embedding_lookup() function while training an LSTM network.

link 2: Tensorflow | Embeddings

In this second article however, I couldn't understand what is happening.

word_embeddings = tf.get_variable(“word_embeddings”,
[vocabulary_size, embedding_size])
embedded_word_ids = tf.gather(word_embeddings, word_ids)

This is given under the training embeddings sections. My doubt is: does the gather function train the embeddings automatically? I am not sure since this op ran very fast on my pc.

Generally: What is the right way to convert words into vectors (link1 or link2) in tensorflow for training a seq2seq model? Also, how to train the embeddings for a seq2seq dataset, since the data is in the form of separate sequences for my task unlike (a continuous sequence of words refer: link 1 dataset)

3
  • 1
    tf.gather doesn't do anything else beyond giving you the "row" of the word_embeddings variable corresponding to each word id in word_ids. But it will backpropagate the gradients correctly if you use it in a graph during a training session, updating word_embeddings appropriately. Commented Sep 19, 2017 at 11:08
  • 1
    The second snippet does not train the embeddings, it just creates the necessary variables. That link says afterwards: "The variable word_embeddings will be learned and at the end of the training it will contain the embeddings for all words in the vocabulary. The embeddings can be trained in many ways, ..." Commented Sep 19, 2017 at 11:10
  • So, am I correct in saying that the embeddings approach is more general than the first link's one where you first extract all the words from your dataset and explicitly train embeddings on it's sequence. While, in the tf.gather approach, it is more like a layer that gets trained while actual training of the LSTM? So, how do you propose I approach a seq2seq model? first link or second link? Commented Sep 19, 2017 at 11:12

1 Answer 1

1

Alright! anyway, I have found the answer to this question and I am posting it so that others might benefit from it.

The first link is more of a tutorial that steps you through the process of exactly how the embeddings are learnt.

In practical cases, such as training seq2seq models or Any other encoder-decoder models, we use the second approach where the embedding matrix gets tuned appropriately while the model gets trained.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.