4

this issue seems specific to microsofttranslator.com so please ... any answers, if you can test against it ...

Using the following URL for translation: http://api.microsofttranslator.com/V2/Ajax.svc/TranslateArray .. I send via cURL some fantastic arguments, and get back the following result:

 [
      {
           "From":"en",
           "OriginalTextSentenceLengths":[13],
           "TranslatedText":"我是最好的",
           "TranslatedTextSentenceLengths":[5]
      },
      {
           "From":"en",
           "OriginalTextSentenceLengths":[16],
           "TranslatedText":"你是最好的",
           "TranslatedTextSentenceLengths":[5]
      }
 ]

When I use json_decode($output, true); on the output from cURL, json_decode gives an error about the syntax not being appropriate in the returned JSON:

 json_last_error() == JSON_ERROR_SYNTAX

The headers being returned with the JSON:

Response Headers

 Cache-Control:no-cache
 Content-Length:244
 Content-Type:application/x-javascript; charset=utf-8
 Date:Sat, 06 Aug 2011 13:35:08 GMT
 Expires:-1
 Pragma:no-cache
 X-MS-Trans-Info:s=63644

Raw content:

 [{"From":"en","OriginalTextSentenceLengths":[13],"TranslatedText":"我是最好的","TranslatedTextSentenceLengths":[5]},{"From":"en","OriginalTextSentenceLengths":[16],"TranslatedText":"你是最好的","TranslatedTextSentenceLengths":[5]}]

cURL code:

    $texts = array("i am the best" => 0, "you are the best" => 0);
    $ch = curl_init(); 
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
    $data = array(
        'appId' => $bing_appId,
        'from' => 'en',
        'to' => 'zh-CHS',
        'texts' => json_encode(array_keys($texts))
    );
    curl_setopt($ch, CURLOPT_URL, $bingArrayUrl . '?' . http_build_query($data)); 
    $output = curl_exec($ch); 
6
  • I'd say that it has something to do with the character encoding somewhere along the way. Could you post more of the cURL code that you are using? Commented Aug 6, 2011 at 14:57
  • added. i personally don't think it's the cURL code but hope to be proven wrong! Commented Aug 6, 2011 at 15:08
  • Atleast for me, given json will go through json_decode without any problems... How did you echoed your json? Commented Aug 6, 2011 at 15:26
  • And you are just taking $output and passing straight to json_decode? If you print md_detect_encoding($output), do you get 'UTF-8'? I'm the same as @Waltsu, I pasted that stuff into a file and decoded the file contents, and it worked fine. Commented Aug 6, 2011 at 15:35
  • encoding: UTF-8 when i run: $output = curl_exec($ch); echo mb_detect_encoding($output); Commented Aug 6, 2011 at 22:58

2 Answers 2

6
+50

The API is returning a wrong byte order mark (BOM).
The string data itself is UTF-8 but is prepended with U+FEFF which is a UTF-16 BOM. Just strip out the first two bytes and json_decode.

...
$output = curl_exec($ch);
// Insert some sanity checks here... then,
$output = substr($output, 3);
...
$decoded = json_decode($output, true);

Here's the entirety of my test code.

$texts = array("i am the best" => 0, "you are the best" => 0);
$ch = curl_init(); 
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$data = array(
    'appId' => $bing_appId,
    'from' => 'en',
    'to' => 'zh-CHS',
    'texts' => json_encode(array_keys($texts))
    );
curl_setopt($ch, CURLOPT_URL, $bingArrayUrl . '?' . http_build_query($data)); 
$output = curl_exec($ch);
$output = substr($output, 3);
print_r(json_decode($output, true));

Which gives me

Array
(
    [0] => Array
        (
            [From] => en
            [OriginalTextSentenceLengths] => Array
                (
                    [0] => 13
                )

            [TranslatedText] => 我是最好的
            [TranslatedTextSentenceLengths] => Array
                (
                    [0] => 5
                )

        )

    [1] => Array
        (
            [From] => en
            [OriginalTextSentenceLengths] => Array
                (
                    [0] => 16
                )

            [TranslatedText] => 你是最好的
            [TranslatedTextSentenceLengths] => Array
                (
                    [0] => 5
                )

        )

)

Wikipedia entry on BOM

Sign up to request clarification or add additional context in comments.

Comments

1

There is nothing syntactically wrong with your JSON string. It is possible that the json is coming back with characters outside the UTF-8 byte range, but this would cause json_decode() to throw an exception indicating that.

Test Code:

ini_set("track_errors", 1);

$json = '
 [
      {
           "From":"en",
           "OriginalTextSentenceLengths":[13],
           "TranslatedText":"我是最好的",
           "TranslatedTextSentenceLengths":[5]
      },
      {
           "From":"en",
           "OriginalTextSentenceLengths":[16],
           "TranslatedText":"你是最好的",
           "TranslatedTextSentenceLengths":[5]
      }
 ]
';

$out = @json_decode($json, TRUE);

if(!$out) {
        throw new Exception("$php_errormsg\n");
} else {
        print_r($out);
}

?>

Output:

$ php -f jsontest.php 
Array
(
    [0] => Array
        (
            [From] => en
            [OriginalTextSentenceLengths] => Array
                (
                    [0] => 13
                )                                                                                                                                                                   

            [TranslatedText] => 我是最好的                                                                                                                                          
            [TranslatedTextSentenceLengths] => Array                                                                                                                                
                (                                                                                                                                                                   
                    [0] => 5                                                                                                                                                        
                )                                                                                                                                                                   

        )                                                                                                                                                                           

    [1] => Array                                                                                                                                                                    
        (                                                                                                                                                                           
            [From] => en                                                                                                                                                            
            [OriginalTextSentenceLengths] => Array                                                                                                                                  
                (                                                                                                                                                                   
                    [0] => 16                                                                                                                                                       
                )                                                                                                                                                                   

            [TranslatedText] => 你是最好的                                                                                                                                          
            [TranslatedTextSentenceLengths] => Array                                                                                                                                
                (                                                                                                                                                                   
                    [0] => 5                                                                                                                                                        
                )                                                                                                                                                                   

        )                                                                                                                                                                           

)

1 Comment

Any idea how i troubleshoot then how the JSON i get back via cURL gives JSON_ERROR_SYNTAX but not when I run it against the JSON as-is?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.