0

I've got the following code, which should be comparing 2 strings after stripping all the whitespace, here is a simplified version of the function:

function not_same($type, $org_str1, $str2) {

    $str1 = preg_replace('/\s+/', '', $org_str1);
    $str2 = preg_replace('/\s+/', '', $str2);

    $tries = [];
    $tries[] = ["str1" => $str1, "str2" => $str2, "encoded1" => urlencode($str1), "encoded2" => urlencode($str2)];        

    if($str1 == $str2) {
        return true;
    } else {
        return false;
    }

}

I'm using this to check on a computer if the processor is the same as a matched model in my database, so $org_str1 is what my client says the computer it is running on has, and $str2 is the cpu in my database that the model should have.

Sometimes these strings have unneeded spaces, so during comparison I remove all of the whitspace so the text itself is compared.

Now I am getting computers back saying that the CPU is wrong, because the match is not made, because there is some whitespace that is not removed.

In this specific case, I'm trying to compare the string Client: Celeron® N3050 vs Server: Celeron® N3050. I'm logging each time what is actually being compared on my server, on my client it says it is comparing Client: Celeron® N3050 vs Server: Celeron®N3050

I tried copying and pasting this whitespace into a str_replace() function, but it did not solve the issue. After that, I got the idea of logging the string with urlencode(), this allows me to see exactly what this mysterious white character is, but I still am at a loss on how to fix the issue.

The strings after urlencode() are Client: Celeron%C2%AE%C2%A0N3050 vs Server: Celeron%C2%AEN3050

As you can see, there is still a whitespace character in my client string, encoded to %C2%A0. Why does preg_replace not get rid of this whitespace, and how can I programmatically remove it?

1 Answer 1

5

\xC2\xA0 is a unicode non-breaking space. Add the u modifier to your regex.

$raw = urldecode('Celeron%C2%AE%C2%A0N3050');

var_dump(
    preg_replace('/\s+/', '', $raw),
    preg_replace('/\s+/u', '', $raw),
    urlencode($raw),
    urlencode(preg_replace('/\s+/u', '', $raw))
);

Output:

string(16) "Celeron® N3050"
string(14) "Celeron®N3050"
string(24) "Celeron%C2%AE%C2%A0N3050"
string(18) "Celeron%C2%AEN3050"
Sign up to request clarification or add additional context in comments.

1 Comment

Thank you! While waiting for an answer I was looking around for more information about this and also stumbled across something stating it was a non-breaking space character, I was trying to find a way to remove non-breaking spaces but couldn't find anything as straightforward as your answer. I really appreciate it! Need some croutons for the side-dish of your sammitch?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.