You can use sed:
echo hello world你好 世界! | sed -E "s/([^a-zA-Z]) ([^a-zA-Z])/\1\2/g"
([^a-zA-Z]) ([^a-zA-Z]) is a regular expression matching a whitespace between two non latin characters (^ negates). The preceding and following characters are captured in groups (#1 and #2)
\1\2 is the replacement string (only groups without whitespace in-between)
Output:
hello world你好世界!
Note: to replace starting and trailing whitespaces, your expression should be:
(^|[^a-zA-Z]) ([^a-zA-Z]|$)
Edit: One thing I didn't take into account is that this kind of expression consumes the characters before and after the whitespaces. So in the case 你 好 世 界 hello world a whitespace was still remaining. You then have to use a regex engine that supports lookarounds:
echo " 你 好 世 界 hello world, !" | perl -pe "s/(?<=^|[^[:ascii:]]) | (?=[^[:ascii:]]|$)//g"
Output:
你好世界hello world
In order to remove space between latin chars/kandji I split the expression in two. I also replaced the condition on latin character with ascii. Should give more appropriate matches
[[ $var =~ [[:unicode:]] ]], and based on this, you could build up an iterative solution. However, I found that in my bash at least, this match does not work (although I have setLANGto be unicode. I don't know why this does not work. Maybe you could factor out a separate question in Stackoverflow from this, i.e. how to do a regex match in bash on characters with unicode code point above 255.