0
$\begingroup$

I have a collection of poems in text data. I'd like to generate a new similar poem using LLM and fine tuning. How should I format the data for it?

If it's questions and answers, the data format should be:

{"prompt": "<prompt text>", "completion": "<ideal generated text>"}
{"prompt": "<prompt text>", "completion": "<ideal generated text>"}
{"prompt": "<prompt text>", "completion": "<ideal generated text>"}
...

But in my case, there is no prompt. only completions. The raw format I have now is:

[
  "poem, poem, poem,\npoem, poem, poem,\npoem, poem, poem,\n",
  "poem, poem, poem,\npoem, poem, poem,\npoem, poem, poem,\n",
  "poem, poem, poem,\npoem, poem, poem,\npoem, poem, poem,\n",
...
]

I google "llm fine tuning text generation" but unfortunately no useful results were found.

Does anybody know how to solve this?

$\endgroup$

1 Answer 1

1
$\begingroup$

Apply the Few-Shot Prompting technique: Give the LLM a few (you need to experiment with the exact number) of poems and use the same instruction for all of them:

{"prompt": "Write a poem", "completion": "<poem 1>"}
{"prompt": "Write a poem", "completion": "<poem 2>"}
{"prompt": "Write a poem", "completion": "<poem 3"}
...

To further improve the answers, it can be helpful to start the conversation with a system message specific to your needs.

$\endgroup$

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.