5

I'm reading a json file where some fields have string like the following: "Eduardo Fonseca Bola\u00c3\u00b1os comparti\u00c3\u00b3 una publicaci\u00c3\u00b3n."

The final end reslt should look like this "Eduardo Fonseca Bolaños compartió una publicación."

  • Is there any out of the box converted to do this using C#?
  • Which is the correct way to convert these kinds of json data?
1
  • You can convert it to string variable with HttpUtility.HtmlDecode ? Commented Mar 17, 2019 at 22:10

2 Answers 2

3

You can use Json.NET library to decode the string. The deserializer decodes the string automatically.

public class Example
{
    public String Name { get; set; }
}
// 
var i = @"{ ""Name"" : ""Eduardo Fonseca Bola\u00c3\u00b1os comparti\u00c3\u00b3 una publicaci\u00c3\u00b3n."" }";
var jsonConverter = Newtonsoft.Json.JsonConvert.DeserializeObject(i);

// Encode the string to UTF8
byte[] bytes = Encoding.Default.GetBytes(jsonConverter.ToString());
var myString = Encoding.UTF8.GetString(bytes);
Console.WriteLine(myString);

// Deserialize using class
var sample = Newtonsoft.Json.JsonConvert.DeserializeObject<Example>(i);
byte[] bytes = Encoding.Default.GetBytes(sample.Name);
var myString = Encoding.UTF8.GetString(bytes);
Console.WriteLine(myString);

The output is:

{
  "Name": "Eduardo Fonseca Bolaños compartió una publicación."
}

Option 2

You can use System.Web.Helpers.Json.Decode method. You won't need to use any external libraries.

Sign up to request clarification or add additional context in comments.

3 Comments

Thanks @Rushi, that's what I tried first, but when using the JSON .NET, the string I get ends up like this { "Name": "Eduardo Fonseca Bolaños compartió una publicación." }
You will have to convert the string to UTF8. I have edited my original post to add the piece of code.
This doesn't seem scalable. You are writing custom deserialising code for every property? I found this because I'm trying to adapt an existing open source project to allow unicode strings, and can't see how this would work in non-trivial cases. I have thousands of string properties in my JSON model classes.
1

Here is the fix for this specific situation

        private static Regex _regex = 
        new Regex(@"(\\u(?<Value>[a-zA-Z0-9]{4}))+", RegexOptions.Compiled);
    private static string ConvertUnicodeEscapeSequencetoUTF8Characters(string sourceContent)
    {
        //Check https://stackoverflow.com/questions/9738282/replace-unicode-escape-sequences-in-a-string
        return _regex.Replace(
            sourceContent, m =>
            {
                var urlEncoded = m.Groups[0].Value.Replace(@"\u00", "%");
                var urlDecoded = System.Web.HttpUtility.UrlDecode(urlEncoded);
                return urlDecoded;
            }
        );
    }

Based on Replace unicode escape sequences in a string

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.