I'm currently working on a little program that compresses text by replacing repeated words/phrases with a reference to the next occurrence - thus compressing a string into a shorter string with no metadata or arrays or whatever techniques are used in real compression. My references are stored as pairs of chars in a sense like this:
(char)7 + (char)(length << 4 + offset)
where (char)7 is just an arbitrary selected char for signaling a compressed reference. Both length and offset are full range byte variables referring to the number of words that will be substituted and the offset until the next occurrence. (I'ts not relevant for the question, but I'm treating them as unsigned bytes by manual unsigned<->signed conversion.)
//Example compression would result like this:
String input = "compression and compression";
String output = (char)7 + (char)18 + " and compression"
//(char)18 - binary 0001 0010 would be saying 1 word repeat, from 2 words ahead.
TL;DR:, I'm afraid that there may be special situations that can interpret my custom char as a special ASCII character. I am aware that Strings in java ignores \0 characters (Due to this question). But are there any other java methods/classes that could cause problems? Say if I were to send/convert the compressed string with things like streams, buffers, readers, char arrays and so on?