The format of data for text generation using LLM

Question

I have a collection of poems in text data. I'd like to generate a new similar poem using LLM and fine tuning. How should I format the data for it?

If it's questions and answers, the data format should be:

{"prompt": "<prompt text>", "completion": "<ideal generated text>"}
{"prompt": "<prompt text>", "completion": "<ideal generated text>"}
{"prompt": "<prompt text>", "completion": "<ideal generated text>"}
...

But in my case, there is no prompt. only completions. The raw format I have now is:

[
  "poem, poem, poem,\npoem, poem, poem,\npoem, poem, poem,\n",
  "poem, poem, poem,\npoem, poem, poem,\npoem, poem, poem,\n",
  "poem, poem, poem,\npoem, poem, poem,\npoem, poem, poem,\n",
...
]

I google "llm fine tuning text generation" but unfortunately no useful results were found.

Does anybody know how to solve this?

Basil · Accepted Answer · 2024-06-02 19:41:41Z

1

Apply the Few-Shot Prompting technique: Give the LLM a few (you need to experiment with the exact number) of poems and use the same instruction for all of them:

{"prompt": "Write a poem", "completion": "<poem 1>"}
{"prompt": "Write a poem", "completion": "<poem 2>"}
{"prompt": "Write a poem", "completion": "<poem 3"}
...

To further improve the answers, it can be helpful to start the conversation with a system message specific to your needs.

answered Jun 2, 2024 at 19:41

Basil

1262 bronze badges

Add a comment |

Stack Exchange Network

The format of data for text generation using LLM

1 Answer 1

You must log in to answer this question.

Hot Network Questions

The format of data for text generation using LLM

1 Answer 1

You must log in to answer this question.

Related

Hot Network Questions