1

I'm trying to write a code to read the content of a web page, but I'm not sure of the used encoding in that page, so how can I write a generic code that returns the right string without the strange symbols? The encoding might be ("UTF-8", "windows-1256", ...). I've tried to but the UTF-8 but when the page is encoded with the second mentioned encoding I'm having some strange symbols.

Here is the code I'm using:

HttpWebRequest request = (HttpWebRequest)WebRequest.Create("SOME-URL");
request.Method = "GET";
WebResponse response = request.GetResponse();
StreamReader streamReader = new StreamReader(response.GetResponseStream(), System.Text.Encoding.UTF8);
string content = streamReader.ReadToEnd();

And here is a link that causes the problem: http://forum.khleeg.com/144828.html

2 Answers 2

3

You must examine the response text to check this field:

<meta http-equiv="Content-Type" content="text/html; charset=windows-1256" />

This chars will also get corretly decoded as they are ANSI. According to data from this tag you should create your Encoding object by the GetEncoding method like this:

var enc1 = Encoding.GetEncoding("windows-1256");
var enc2 = Encoding.GetEncoding(1256);

Another way is to use the .ContentEncoding property of the HttpWebResponse:

HttpWebResponse response = (HttpWebResponse)request.GetResponse();
var enc1 = Encoding.GetEncoding(response.ContentEncoding);

Or the .CharacterSet property:

string Charset = response.CharacterSet;
var enc1 = Encoding.GetEncoding(Charset);
Sign up to request clarification or add additional context in comments.

3 Comments

But does that tag always exist in the web pages? and I've mentioned "windows-1256" as an example it might be any other encoding, any suggestions to work around this.
@Mousa This tag always on pages that used non-standard encoding. But I've updated the answer for this.
It could be HttpWebResponse.CharacterSet instead of HttpWebResponse.ContentEncoding.
0

The page you mention does tell you EXACTLY which encoding it uses, here's the string found there.

<meta http-equiv="Content-Type" content="text/html; charset=windows-1256" />

Can't you search for a string like this one and act upon this information?

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.