-1

I'm trying to strip hidden control chars (especially \x{89} and \x{88}) with preg_replace() from a string. It is "ˆText" (it starts with an "\x{88}" char), mb_detect_encoding says it is UTF-8.

The code used is $result = preg_replace('/\x{88}/u','',$string); but the result is null.

If I use the code without /u modifier I get "�Text", the control char is replaced with a replacement char (U+FFFD).

I'm using PHP 7.1 on Windows. The same search with BBEdit and NotePad++ replaces the chars correctly.

Any ideas?

Thanks, A.

7
  • try reading this Commented Dec 25, 2020 at 9:23
  • Thanks, I tried all the solutions but don't work for me. Commented Dec 25, 2020 at 9:29
  • If preg_replace returns null then it is due to an error. Try calling preg_last_error after your preg_replace. Then compare the error code with the errors mentioned in the doc [here] (php.net/manual/en/function.preg-last-error.php) Commented Dec 25, 2020 at 9:34
  • ˆ is not \x{88}, it is \x{2C6}. Also, why not just use str_replace("\u{02C6}", "", $string)? Commented Dec 25, 2020 at 11:24
  • preg_last_error returs code "4" that is PREG_BAD_UTF8_ERROR. Thanks. Commented Dec 25, 2020 at 12:13

1 Answer 1

0

preg_replace() returns "null" only on error. Run preg_last_error() right after preg_replace() and check the returned error code.


As a side note: Your wording suggests that you want to strip all control characters, not just the two explicitly mentioned. Then you would be better of matching against "\p{Cc}"

preg_replace('/\p{Cc}/u', '', $string);
Sign up to request clarification or add additional context in comments.

4 Comments

You are right, the function returns 4/PREG_BAD_UTF8_ERROR error code. The files comes out from a shell_exec pre-processor that should strip all control chars but sometimes fails with some characters. I suppose these files have problem with characters encoding. So I'm trying to fix without success. Thanks.
It fails also with preg_replace('/\p{Cc}/u', '', $string);, same error "4". It seams it is not possibile to parse this string.
The string you are trying to parse is just not valid UTF-8. If you get the String from shell_exec(), the called executable is not delivering the output in UTF-8 charset.
If the script is running on a Linux machine, check the output of the "locale" command. if the default locale is not "xx_XX.UTF-8" but something like xx_XX.ISO8859-1 then that might be the charset your executable is using. you still can convert that with mb_convert_encoding() before running preg_replace()

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.