Why does JSON.parse choke on encoded characters in nodejs?

Question

I'm attempting to look up the word "flower" in Google's dictionary semi-api. Source:

https://gist.github.com/DelvarWorld/0a83a42abbc1297a6687

Long story short, I'm calling JSONP with a callback paramater then regexing it out.

But it hits this snag:

undefined:1
ple","terms":[{"type":"text","text":"I stopped to buy Bridget some \x3cem\x3ef
                                                                    ^
SyntaxError: Unexpected token x
    at Object.parse (native)

Google is serving me escaped HTML characters, which is fine, but JSON.parse cannot handle them?? What's weirding me out is this works just fine:

$ node

> JSON.parse( '{"a":"\x3cem"}' )
  { a: '<em' }

I don't get why my thingle is crashing

Edit These are all nice informational repsonses, but none of them help me get rid of the stacktrace.

Take a look at string in json.org

Paul
– Paul

2013-07-31 03:59:40 +00:00
Commented Jul 31, 2013 at 3:59 — Paul
– Paul, Commented Jul 31, 2013 at 3:59

icktoofay · Accepted Answer · 2018-01-26 07:38:02Z

2

\xHH is not part of JSON, but is part of JavaScript. It is equivalent to \u00HH. Since the built-in JSON doesn't seem to support it and I doubt you'd want to go through the trouble of modifying a non-built-in JSON implementation, you might just want to run the code in a sandbox and collect the resulting object.

edited Jan 26, 2018 at 7:38

answered Jul 31, 2013 at 4:05

icktoofay

130k23 gold badges261 silver badges239 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

bobince Over a year ago

Another hack for if you need to parse a “nearly JSON” structure is to replace \x with \u00 before parsing. This is slightly safer as it avoids eval'ing.

icktoofay Over a year ago

@bobince: Right; that's sort of why I included the “\xHH ≡ \u00HH” bit. The problem with that is that you have to be careful of other escapes, e.g., don't change \\xHH (which is the literal text \xHH) into \\u00HH (the literal text \u00HH). I, too, agree that evaling is usually undesirable, but if you do it in a sandbox without access to…almost anything, with a timeout, it should be safe.

Paul · Accepted Answer · 2013-07-31 04:03:58Z

0

According to http://json.org, a string character in a JSON representation of string may be:

any-Unicode-character- except-"-or--or- control-character
\"
\
\/
\b
\f
\n
\r
\t
\u four-hex-digits

So according to that list, the "json" you are getting is malformed at \x3

answered Jul 31, 2013 at 4:03

Paul

27.8k13 gold badges90 silver badges127 bronze badges

Comments

Musa · Accepted Answer · 2013-07-31 04:04:54Z

0

The reason why it works is because these two are equivalent.

JSON.parse( '{"a":"\x3cem"}' )

and

JSON.parse( '{"a":"<em"}' )

you string is passed to JSON.parse already decoded since its a literal \x3cem is actually <em

Now, \xxx is valid in JavaScript but not in JSON, according to http://json.org/ the only characters you can have after a \ are "\/bfnrtu.

answered Jul 31, 2013 at 4:04

Musa

97.9k17 gold badges123 silver badges144 bronze badges

Comments

Selman Kahya · Accepted Answer · 2013-10-14 10:50:38Z

0

answer is correct, but needs couple of modifications. you might wanna try this one: https://gist.github.com/Selmanh/6973863

answered Oct 14, 2013 at 10:50

Selman Kahya

1852 silver badges2 bronze badges

Collectives™ on Stack Overflow

Why does JSON.parse choke on encoded characters in nodejs?

4 Answers 4

2 Comments

Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

2 Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related