I know that BPE is used on sentences that can generate some sort of tokenization and pairings, with spare bytes being used to create such mappings. Why doesn't it work on BPE?
What would be the best guess/way to actually try to perform BPE as a form of compression of binary data?