4

I have a C# Project in Visual studio which download and parse XML file that contains Korean, Chinese and another unicode characters. For example for korean artist named Taeyang it produce XML like this :

<name>태양</name>

but it returns

<name>??</name>

I have tried StreamReader Encoding.Default but result is

<name>태양</name>

The code:

string address = String.Format("http://musicbrainz.org/ws/2/artist/{0}?inc=url-rels", mbids[ord]);
HttpWebRequest newRequest = WebRequest.Create(address) as HttpWebRequest;
               newRequest.Headers["If-None-Match"] = etagProf;
               newRequest.Headers[HttpRequestHeader.AcceptEncoding] = "gzip";
var response = newRequest.GetResponse();
// Reader
Stream stream = response.GetResponseStream();
StreamReader reader = new StreamReader(stream, Encoding.UTF-8);
string data = reader.ReadToEnd();

and the xml source:

<?xml version="1.0" encoding="UTF-8"?>
<metadata xmlns="http://musicbrainz.org/ns/mmd-2.0#">
    <artist type="Person" id="d84e5667-3cbe-4556-b551-9d7e4be95d71">   
        <name>태양</name>
        <sort-name>Taeyang</sort-name><gender>Male</gender>
        <country>KR</country>
        ...........
    </artist>
</metadata>

I'm confused, why it happens ? Any idea dude ?

4
  • 1
    Are you sure the source file is actually in UTF8? Commented Feb 20, 2015 at 7:58
  • Sure, i checked its charset from response header Commented Feb 20, 2015 at 7:59
  • 1
    Can you share one of your original input xml files? Commented Feb 20, 2015 at 8:15
  • @netblognet : just updated my question, please check it :) Commented Feb 20, 2015 at 8:20

3 Answers 3

6

using the code below (notice I comment out 2 of your lines)

//newRequest.Headers["If-None-Match"] = "d84e5667-3cbe-4556-b551-9d7e4be95d71";
//newRequest.Headers[HttpRequestHeader.AcceptEncoding] = "gzip";

and changed your line: StreamReader(stream, Encoding.UTF-8);

to : StreamReader(stream, Encoding.UTF8);

I got a good result characters wise: enter image description here

string address = String.Format("http://musicbrainz.org/ws/2/artist/{0}?inc=url-rels","d84e5667-3cbe-4556-b551-9d7e4be95d71");
HttpWebRequest newRequest = WebRequest.Create(address) as HttpWebRequest;
//newRequest.Headers["If-None-Match"] = "d84e5667-3cbe-4556-b551-9d7e4be95d71";
//newRequest.Headers[HttpRequestHeader.AcceptEncoding] = "gzip";
var response = newRequest.GetResponse();
// Reader
Stream stream = response.GetResponseStream();
StreamReader reader = new StreamReader(stream, Encoding.UTF8);
string data = reader.ReadToEnd();
MessageBox.Show(data);
Sign up to request clarification or add additional context in comments.

5 Comments

I just did it and output 태양 again
@Sag1v: GetEncoding is not relied on to determine the real text encoding
@Sag1v - came to the same result. The code runs fine on my machine, too. So if this doesn't solve the problem of Michael Antonio, maybe his OS has some problems handling the UTF-8 code.
@netblognet could be, but I think the problem resist in the 2 lines I commented out newRequest.Headers["If-None-Match"] = "d84e5667-3cbe-4556-b551-9d7e4be95d71"; newRequest.Headers[HttpRequestHeader.AcceptEncoding] = "gzip";
hi all, i fixed this problem. please look at my answer. thanks for your time ... nice to share :)
0

try UTF8 Encoding

StreamReader sr= new StreamReader(file_name, System.Text.Encoding.UTF8);

2 Comments

@MichaelAntonio is Encoding.UTF-8 a typo?
What's the computer's system language and OS you're working on?
0

I found that Console.WriteLine() can't output unicode clearly. Everything unicode (e.g. Korean, Chinese) and all characters except a-z and 0-9 can't output as expected cause Console.WriteLine() use single font Raster Font

But the main problem was about my DB CONNECTION, i forget to add charset=utf-8 in my connection string

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.