How to replace invisible characters (which are not actually spaces) with regex

Question

I have the following string which I want to 'clean' from multiple whitespaces:

$string = "This is   a test string";

Not a big deal right? However, the string is not 'cleaned' after using:

$string = preg_replace('/\s+/', ' ', $string);

Because, when I output in ISO-8859-1, the string is like this:

$test = "This is a Â test string";

So, how can I remove these characters?

@WiktorStribiżew I'm not a unicode guy so for my info, how is Â a whitespace charcter? — AbraCadaver
– AbraCadaver, Commented Mar 2, 2017 at 21:30
@AbraCadaver: It is not a whitespace character, it is a char that some Unicode whitespace char turned into after converting into a different encoding. — Wiktor Stribiżew
– Wiktor Stribiżew, Commented Mar 2, 2017 at 21:32

Wiktor Stribiżew · Accepted Answer · 2017-09-25 06:27:42Z

2

You may use the /u UNICODE modifier:

$string = preg_replace('/\s+/u', ' ', $string);

The /u modifier enables the PCRE engine to handle strings as UTF8 strings (by turning on PCRE_UTF8 verb) and make the shorthand character classes in the pattern Unicode aware (by enabling PCRE_UCP verb)

The main point is that \s will now match all Unicode whitespace and the input string is treated as a Unicode string.

edited Sep 25, 2017 at 6:27

answered Mar 2, 2017 at 21:07

Wiktor Stribiżew

631k41 gold badges502 silver badges632 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Anthony Over a year ago

Thanks so much! I was trying to fix this for hours.

Collectives™ on Stack Overflow

How to replace invisible characters (which are not actually spaces) with regex

1 Answer 1

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related