8

The JSON specification states that control characters that must be escaped are only with codes from U+0000 to U+001F:

7.  Strings

   The representation of strings is similar to conventions used in the C
   family of programming languages.  A string begins and ends with
   quotation marks.  All Unicode characters may be placed within the
   quotation marks, except for the characters that must be escaped:
   quotation mark, reverse solidus, and the control characters (U+0000
   through U+001F).

Main idea of escaping is to don't damage output when printing JSON document or message on terminal or paper.

But there other control characters like [DEL] from C0 and other control characters from C1 set (U+0080 through U+009F). Shouldn't be they also escaped in JSON strings?

1 Answer 1

1

From the JSON specification:

  1. String and Character Issues
    8.1. Character Encoding
    JSON text SHALL be encoded in UTF-8, UTF-16, or UTF-32.

In UTF-8, all codepoints above 127 are encoded in multiple bytes. About half of those bytes are in the C1 control character range. So in order to avoid having those bytes in a UTF-8 encoded JSON string, all of those code points would need to be escaped. This effectively eliminates the use of UTF-8 and the JSON string might as well be encoded in ASCII. As ASCII is a subset of UTF-8 this is not disallowed by the standard. So if you are concerned with putting C1 control characters in the byte stream just escape them, but requiring every JSON representation to use ASCII would be wildly inefficient in anything but an english environment.

UTF-16 and UTF-32 could not possibly be parsed by something that uses the C1 (or even C0) control characters so the point is rather moot for those encodings.

Sign up to request clarification or add additional context in comments.

2 Comments

Escaping doesn't have anything to do with character set encoding. It has to do with syntax. A String (once the file is decoded with the character encoding and escapes unescaped) can have any finite sequence of UTF-16 code units (which is what String means), regardless of the file encoding.
@TomBlodget the two are linked in that both the escaping and the encoding used affect the bytes that are used to represent the JSON string. Most systems that actually use C1 controls either can't handle UTF at all, or they use the multi code point representations that start with an ESC byte. Thus the fact that JSON is is encoded in a UTF format means that the systems that it can operate on likely don't care about C1 code points appearing in the JSON string.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.