I would like to take input from user as UTF8 string & then detect the language of the String & store the string as a compressed byte[]. If all characters are not of same language, then it is not a valid input. After getting a valid input from user I would like to store this input string as bytes array.
If user entered string with non english characters then each character would occupy more than 1 byte, so I would like to store the language of the string & then store each character in a single byte(i guess it would now be possible to store the character in single byte by storing just difference from start code point of that language & since all characters are from same language & may(!?) therefore fit in single byte capacity because of small range!?). This is how I compress each character to fit in single byte.
Is this a correct approach? If yes how can I detect the language of the characters in the string ?
どうしようま) are from japanese language. I would store the start code point for that language as per the UTF8 encoding & then compress the byte[] by storing the difference from start code point for each character instead of entire code point which wouldn't fit in single byte