preg_replace is not removing all whitespace characters from string

Question

I've got the following code, which should be comparing 2 strings after stripping all the whitespace, here is a simplified version of the function:

function not_same($type, $org_str1, $str2) {

    $str1 = preg_replace('/\s+/', '', $org_str1);
    $str2 = preg_replace('/\s+/', '', $str2);

    $tries = [];
    $tries[] = ["str1" => $str1, "str2" => $str2, "encoded1" => urlencode($str1), "encoded2" => urlencode($str2)];        

    if($str1 == $str2) {
        return true;
    } else {
        return false;
    }

}

I'm using this to check on a computer if the processor is the same as a matched model in my database, so $org_str1 is what my client says the computer it is running on has, and $str2 is the cpu in my database that the model should have.

Sometimes these strings have unneeded spaces, so during comparison I remove all of the whitspace so the text itself is compared.

Now I am getting computers back saying that the CPU is wrong, because the match is not made, because there is some whitespace that is not removed.

In this specific case, I'm trying to compare the string Client: Celeron® N3050 vs Server: Celeron® N3050. I'm logging each time what is actually being compared on my server, on my client it says it is comparing Client: Celeron® N3050 vs Server: Celeron®N3050

I tried copying and pasting this whitespace into a str_replace() function, but it did not solve the issue. After that, I got the idea of logging the string with urlencode(), this allows me to see exactly what this mysterious white character is, but I still am at a loss on how to fix the issue.

The strings after urlencode() are Client: Celeron%C2%AE%C2%A0N3050 vs Server: Celeron%C2%AEN3050

As you can see, there is still a whitespace character in my client string, encoded to %C2%A0. Why does preg_replace not get rid of this whitespace, and how can I programmatically remove it?

Community · Accepted Answer · 2023-11-17 20:42:18Z

5

\xC2\xA0 is a unicode non-breaking space. Add the u modifier to your regex.

$raw = urldecode('Celeron%C2%AE%C2%A0N3050');

var_dump(
    preg_replace('/\s+/', '', $raw),
    preg_replace('/\s+/u', '', $raw),
    urlencode($raw),
    urlencode(preg_replace('/\s+/u', '', $raw))
);

Output:

string(16) "Celeron® N3050"
string(14) "Celeron®N3050"
string(24) "Celeron%C2%AE%C2%A0N3050"
string(18) "Celeron%C2%AEN3050"

edited Nov 17, 2023 at 20:42

CommunityBot

11 silver badge

answered Sep 24, 2018 at 19:34

Sammitch

32.5k7 gold badges58 silver badges92 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

GrumpyCrouton Over a year ago

Thank you! While waiting for an answer I was looking around for more information about this and also stumbled across something stating it was a non-breaking space character, I was trying to find a way to remove non-breaking spaces but couldn't find anything as straightforward as your answer. I really appreciate it! Need some croutons for the side-dish of your sammitch?

Collectives™ on Stack Overflow

preg_replace is not removing all whitespace characters from string

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related