I’m just starting to explore the Hugging Face library and have a question related to Text2Text models.
Suppose I have a model1 (a Text2Text model, e.g. BART) pre-trained on a masked language modeling task, where it has learned the syntactic structure based on the tokenization strategy of tokenizer1.
Now, I want to fine-tune model1 using the same style of text related to the masked language modeling task as input, but aim to decode outputs into a different format using a separate tokenizer (tokenizer2).
Is this possible? The approach I had in mind involves sequential text generation:
- The original
model1generates text. - A fine-tuned
model2continues the generation based on the output of model1.
Apologies if this is something trivial. Any comment or suggestion on specific tutorials is really appreciated!