0

In an app retrieving json from external API (Instagram if that matter) I've got this kind of json returned:

{"caption":"\ud835\udc2f\ud835\udc1e\u0301\ud835\udc25\ud835\udc28 \n\n \ud835\udc2f\ud835\udc28\u0302\ud835\udc2d\ud835\udc2b\ud835\udc1e\ud835\udc2c \n\n \ud835\udfd1\ud835\udfd2\ud835\udc1e\u0300\ud835\udc26\ud835\udc1e \ud835\udc1e\u0301\ud835\udc1d\ud835\udc22\ud835\udc2d\ud835\udc22\ud835\udc28\ud835\udc27"}
using Newtonsoft.Json;

var json = "{\"caption\":\"\\ud835\\udc2f\\ud835\\udc1e\\u0301\\ud835\\udc25\\ud835\\udc28 \\n\\n \\ud835\\udc2f\\ud835\\udc28\\u0302\\ud835\\udc2d\\ud835\\udc2b\\ud835\\udc1e\\ud835\\udc2c \\n\\n \\ud835\\udfd1\\ud835\\udfd2\\ud835\\udc1e\\u0300\\ud835\\udc26\\ud835\\udc1e \\ud835\\udc1e\\u0301\\ud835\\udc1d\\ud835\\udc22\\ud835\\udc2d\\ud835\\udc22\\ud835\\udc28\\ud835\\udc27\"}";
var entity = JsonConvert.DeserializeObject<TestEntity>(json);

if (entity != null)
{
    var caption = entity.Caption;
}

public class TestEntity
{
    public string Caption { get; set; }
}

When I deserialize this example using the above code in a simple console app, the string I get works well when I paste it here ("𝐯𝐞́𝐥𝐨 \n\n 𝐯𝐨̂𝐭𝐫𝐞𝐬 \n\n 𝟑𝟒𝐞̀𝐦𝐞 𝐞́𝐝𝐢𝐭𝐢𝐨𝐧") on desktop, but not on android, but when it is then saved into SQLServer, the data looks somewhat wrongly encoded:

34è édition

It does look the same in Visual Studio while debugging:

VS debugging

And on any browser on Android, those badly written characters behave in the same way, but not on any desktop browser nor iOS.

I have no clue of what could be the issue here.

4
  • After investigation, this problem has nothing to do with asp.net core. From the network tab, we can see that the conversion of Unicode characters is the browser's ability, and Unicode characters are displayed on mobile terminals (Android, iPhone, iPad). Commented Aug 21, 2024 at 11:12
  • @JasonPan So the issue should be reported to...? Commented Aug 21, 2024 at 11:22
  • Hi peljoe, I think you can create some github issues for safari, chrome or edge (Mobile). And share the network trace and let them know you have narrow down the issue, and maybe one of them will give you response. Commented Aug 21, 2024 at 11:39
  • However, as a webapi, it is rarely accessed directly in the browser and is generally processed in javascript, so it is very likely that mobile browser manufacturers will not care about this issue. After all, there are some plug-ins that can implement the Unicode transcoding function. Commented Aug 21, 2024 at 11:41

1 Answer 1

0

First I'd use System.Text.Json instead of newtonson but that's just personal preference.

See the following example when using a .NET 8 example:

using System.Text;
using System.Text.Json;

var json = "{\"caption\":\"\\ud835\\udc2f\\ud835\\udc1e\\u0301\\ud835\\udc25\\ud835\\udc28 \\n\\n \\ud835\\udc2f\\ud835\\udc28\\u0302\\ud835\\udc2d\\ud835\\udc2b\\ud835\\udc1e\\ud835\\udc2c \\n\\n \\ud835\\udfd1\\ud835\\udfd2\\ud835\\udc1e\\u0300\\ud835\\udc26\\ud835\\udc1e \\ud835\\udc1e\\u0301\\ud835\\udc1d\\ud835\\udc22\\ud835\\udc2d\\ud835\\udc22\\ud835\\udc28\\ud835\\udc27\"}";
Console.OutputEncoding = Encoding.UTF8;
var options = new JsonSerializerOptions
{
    PropertyNameCaseInsensitive = true
};
var entity = JsonSerializer.Deserialize<TestEntity>(json, options);
if (entity != null)
{
    var caption = entity.Caption;
    Console.WriteLine(caption);
}

public class TestEntity
{
    public string Caption { get; set; }
}

The issue is that you don't save the data as utf-8 encoded. I strongly advise to always use utf-8 if possible. To see how encoding works you can simply comment out the line Console.OutputEncoding = Encoding.UTF8; The command line interface will just show question marks. The console doesnt know what utf-8 encoded characters are. Same happens in your database. Just check the properties in your database and set it to utf-8 encoding.

Sign up to request clarification or add additional context in comments.

5 Comments

Hi, thank you for your time and help. I see now what might be the issue, and how to resolve it using System.Text.Json instead of Newtonsoft.Json, but the real app is still in .net core 2.1, where System.Text.Json isn't available... That's why I do hope for a solution working with Newtonsoft temporarily.
See this example if you cant migrate to system text json stackoverflow.com/questions/55212403/…
I stumbled upon it and I was not able to make it work, I might try again after a small break. :) Regarding your proposition, I think if it's still "wrongly encoded" inside VS while debugging, the issue would still be present ibb.co/SV59jgh
@peljoe just open the variable in text visualizer and it will shot (almost) correctly ibb.co/jG1CPPS However I dont really know why seeing a value during debugging is so important...
because if I copy paste it there and come see this comment from my android phone, the issue is still present: "𝐯𝐞́𝐥𝐨 \n\n 𝐯𝐨̂𝐭𝐫𝐞𝐬 \n\n 𝟑𝟒𝐞̀𝐦𝐞 𝐞́𝐝𝐢𝐭𝐢𝐨𝐧" (from debugger) and 𝐯𝐞́𝐥𝐨 𝐯𝐨̂𝐭𝐫𝐞𝐬 𝟑𝟒𝐞̀𝐦𝐞 𝐞́𝐝𝐢𝐭𝐢𝐨𝐧 (from text visualizer)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.