1

I know that BPE is used on sentences that can generate some sort of tokenization and pairings, with spare bytes being used to create such mappings. Why doesn't it work on BPE?

What would be the best guess/way to actually try to perform BPE as a form of compression of binary data?

1 Answer 1

0

I've not worked with binary pair encoding so maybe got confused, but it looks very similar to Huffman coding.

The dictionary is constructed similarly in the Huffman code. But rather than using characters to represent a word in a dictionary, the Huffman code assigns shorter binary values (0 or 1) which are based on the frequency of the repetition of the word. Most frequent words are assigned shorter binary values.

Huffman code is used in many compression algorithms already, like DEFLATE, JPEG, MP3, Brotli etc.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.