121

Using JsonSerializer.Serialize(obj) will produce an escaped string, but I want the unescaped version. For example:

using System;
using System.Text.Json;

public class Program
{
    public static void Main()
    {
        var a = new A{Name = "你好"};
        var s = JsonSerializer.Serialize(a);
        Console.WriteLine(s);
    }
}

class A {
    public string Name {get; set;}
}

will produce a string {"Name":"\u4F60\u597D"} but I want {"Name":"你好"}

I created a code snippet at https://dotnetfiddle.net/w73vnO
Please help me.

1
  • 1
    Aside from making the data less readable, the default escaping also bloats the size of the json by 40 percent. And that is a significant change when you are caching or sending large json payloads. Commented Dec 19, 2021 at 7:46

4 Answers 4

146

You need to set the JsonSerializer options not to encode those strings.

JsonSerializerOptions jso = new JsonSerializerOptions();
jso.Encoder = System.Text.Encodings.Web.JavaScriptEncoder.UnsafeRelaxedJsonEscaping;

Note that using UnsafeRelaxedJsonEscaping comes with security risks such as XSS attacks.

Then you pass this options when you call your Serialize method.

var s = JsonSerializer.Serialize(a, jso);        

Full code:

JsonSerializerOptions jso = new JsonSerializerOptions();
jso.Encoder = System.Text.Encodings.Web.JavaScriptEncoder.UnsafeRelaxedJsonEscaping;

var a = new A { Name = "你好" };
var s = JsonSerializer.Serialize(a, jso);        
Console.WriteLine(s);

Result:

enter image description here

If you need to print the result in the console, you may need to install additional language. Please refer here.

Sign up to request clarification or add additional context in comments.

9 Comments

I could not believe my eyes when I found this: learn.microsoft.com/en-us/dotnet/api/… This is extremely surprising behavior by the default encoder.
It's important to understand the potential concerns with using this in your scenario and I would recommend safer alternatives if feasible. See learn.microsoft.com/en-us/dotnet/standard/serialization/…
Those docs never mention why they avoid serializing those. Why was the decision made to encode everything when characters like the double-quote " and control chars have specific escape sequences for them?!
Using an "unsafe" encoding is not the answer, the answer from ahsonkhan is correct
Be careful with this since it can result in XSS attacks! See @ahsonkhan's answer instead!
|
67

To change the escaping behavior of the JsonSerializer you can pass in a custom JavascriptEncoder to the JsonSerializer by setting the Encoder property on the JsonSerializerOptions.

https://learn.microsoft.com/en-us/dotnet/api/system.text.json.jsonserializeroptions.encoder?view=netcore-3.0#System_Text_Json_JsonSerializerOptions_Encoder

The default behavior is designed with security in mind and the JsonSerializer over-escapes for defense-in-depth.

If all you are looking for is escaping certain "alphanumeric" characters of a specific non-latin language, I would recommend that you instead create a JavascriptEncoder using the Create factory method rather than using the UnsafeRelaxedJsonEscaping encoder.

JsonSerializerOptions options = new JsonSerializerOptions
{
    Encoder = JavaScriptEncoder.Create(UnicodeRanges.BasicLatin, UnicodeRanges.CjkUnifiedIdeographs)
};

var a = new A { Name = "你好" };
var s = JsonSerializer.Serialize(a, options);
Console.WriteLine(s);

Doing so keeps certain safe-guards, for instance, HTML-sensitive characters will continue to be escaped.

I would caution against using System.Text.Encodings.Web.JavaScriptEncoder.UnsafeRelaxedJsonEscaping flippantly since it does minimal escaping (which is why it has "unsafe" in the name). If the JSON you are creating is written to a UTF-8 encoded file on disk or if its part of web request which explicitly sets the charset to utf-8 (and is not going to potentially be embedded within an HTML component as is), then it is probably OK to use this.

See the remarks section within the API docs: https://learn.microsoft.com/en-us/dotnet/api/system.text.encodings.web.javascriptencoder.unsaferelaxedjsonescaping?view=netcore-3.0#remarks

You could also consider specifying UnicodeRanges.All if you expect/need all languages to remain un-escaped. This still escapes certain ASCII characters that are prone to security vulnerabilities.

JsonSerializerOptions options = new JsonSerializerOptions
{
    Encoder = JavaScriptEncoder.Create(UnicodeRanges.All)
};

For more information and code samples, see: https://learn.microsoft.com/en-us/dotnet/standard/serialization/system-text-json-how-to?view=netcore-3.0#customize-character-encoding

See the Caution Note

4 Comments

@joey I know this came later but it should become the accepted answer
"If the JSON you are creating is written to a UTF-8 encoded file on disk or if its part of web request which explicitly sets the charset to utf-8 (and is not going to potentially be embedded within an HTML component as is), then it is probably OK to use this. [UnsafeRelaxedJsonEscaping]" to highlight this part
So if I understand correctly, if I do not use the JSON for HTML (I still don't understand, you are supposed to sanitize it before putting into HTML if it's your use case anyway), I can safely use UnsafeRelaxedJsonEscaping?
27

Use:

JsonSerializerOptions options = new JsonSerializerOptions
{
    Encoder = JavaScriptEncoder.Create(UnicodeRanges.All)
};

2 Comments

that's the way, thx
This worked perfectly for all Unicode character ranges. This is should have been the highest rank answer. Thank you.
12

You can use: System.Text.RegularExpressions.Regex.Unescape(string) to unescape the unicode characters. https://learn.microsoft.com/en-us/dotnet/api/system.text.regularexpressions.regex.unescape

Updating example from original question:

using System;
using System.Text.Json;

public class Program
{
    public static void Main()
    {
            var a = new A{Name = "你好"};
            var s = JsonSerializer.Serialize(a);
        
            var unescaped = System.Text.RegularExpressions.Regex.Unescape(s);

            Console.WriteLine(s);
            Console.WriteLine(unescaped);
        }
}

class A {
    public string Name {get; set;}
}

Output:

{"Name":"\u4F60\u597D"}
{"Name":"你好"}

1 Comment

This works the best for all cases.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.