1

I have the following string which I want to 'clean' from multiple whitespaces:

$string = "This is   a test string";

Not a big deal right? However, the string is not 'cleaned' after using:

$string = preg_replace('/\s+/', ' ', $string);

Because, when I output in ISO-8859-1, the string is like this:

$test = "This is a  test string";

So, how can I remove these characters?

5
  • Try preg_replace('/\s+/u', ' ', $string) Commented Mar 2, 2017 at 21:04
  • Lol hero's are fast right? thanks Commented Mar 2, 2017 at 21:05
  • @WiktorStribiżew I'm not a unicode guy so for my info, how is  a whitespace charcter? Commented Mar 2, 2017 at 21:30
  • @AbraCadaver: It is not a whitespace character, it is a char that some Unicode whitespace char turned into after converting into a different encoding. Commented Mar 2, 2017 at 21:32
  • @WiktorStribiżew: Great thanks! Commented Mar 2, 2017 at 21:36

1 Answer 1

2

You may use the /u UNICODE modifier:

$string = preg_replace('/\s+/u', ' ', $string);

The /u modifier enables the PCRE engine to handle strings as UTF8 strings (by turning on PCRE_UTF8 verb) and make the shorthand character classes in the pattern Unicode aware (by enabling PCRE_UCP verb)

The main point is that \s will now match all Unicode whitespace and the input string is treated as a Unicode string.

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks so much! I was trying to fix this for hours.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.