1

Situation

I'm importing huge JSON files into a database. It contains fields that were filled in by users using an online wysiwyg editor. This allowed them to also paste in special characters, typically copied from a MS Word document.

Problem

After decoding the JSON file, a couple of special characters are left out. Turns out most of them are unicode control characters for example † which is character U+0086.

Example

<?php
$json = '{"test": "start \u0086 end"}';
$decoded = json_decode($json);
echo $decoded->test . PHP_EOL;

Output:

start  end

Expected output:

start † end

Temporary fix

For the moment I applied this dirty fix, but I'm still looking for a more elegant way to replace all unicode characters.

protected static function replaceUnicodeCharacters(&$string)
{
    $replace = [
        "\u0086" => "†",
        "\u00b0" => "°",
        "\u0093" => "“",
        "\u0094" => "”",
        "\u0091" => "‘",
        "\u0092" => "’",
        "\u009c" => "œ",
        "\u00f6" => "ö",
        "\u00f9" => "ù",
        "\u00ad" => "­",
        "\u0096" => "–",
        "\u00fb" => "û",
        "\u00a0" => " ",
        "\u0085" => "…",
        "\u00ab" => "«",
        "\u00bb" => "»",
        "\u008c" => "Œ",
        "\u00c0" => "À",
        "\u00ff" => "ÿ",
        "\u00fc" => "ü",
    ];

    $string = str_ireplace(array_keys($replace), array_values($replace), $string);
}
1

2 Answers 2

0

0x86 when interpreted as Windows-1252 is †. You're just missing an encoding step:

$decoded->test = mb_convert_encoding($decoded->test, "Windows-1252", "UTF-8");
echo '<html><meta charset="Windows-1252">';
echo $decoded->test . PHP_EOL;
Sign up to request clarification or add additional context in comments.

2 Comments

For me this results in start � end. Echoing the charset is not possible as I'm using a console application.
Then set the console's encoding appropriately. Looks like you have configured it to display UTF-8, but as I'm trying to get across here is that your data is in Windows-1252.
-1

EDIT: PHP Unicode in JSON

I hope maybe at least, that helps...

2 Comments

This yields NULL on PHP 7.3 and Uncaught JsonException: Syntax error in php shell code:1 with JSON_THROW_ON_ERROR enabled
Okay, I'm sorry. Just tried to help. :( This is because this character is not supported by most of the browsers, maybe. PHP can't display the cross properly D: I guess.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.