I'm using the following regular expression with preg_replace to strip the string of any punctuation:
$string = preg_replace("#((?!-|')\pP)+#", '', $string);
But I realized that it ruins some unicode characters. When the string is something like this "höpöttää?!...", I get back this "h�p�ttää" with no punctuation but ruined characters.
I read the PHP documentation and found some advice to use `...`u modifier. So I tried this:
$string = preg_replace("`#((?!-|')\pP)+#`u", '', $string);
And it really fixed the problem with characters. But now it stopped removing the punctuation. With this string "höpöttää?!...", I get the same "höpöttää?!...".