11

I am developing an C# console application for testing whether a URL is valid or not. It works well for most of URLs. But we found that there are some cases the application always got 404 response from target site but the URLs actually work in the browser. And those URLs also works when I tried them in the tools such as DHC (Dev HTTP Client).

In the beginning, I though that this could be the reason of not adding right headers. But after tried using Fiddler to compose a http request with same headers, it works in Fiddler.

So what's wrong with my code? Is there any bug in .NET HttpClient?

Here are the simplified code of my test application:

class Program
{
    static void Main(string[] args)
    {
        var urlTester = new UrlTester("http://www.hffa.it/short-master-programs/fashion-photography");

        Console.WriteLine("Test is started");

        Task.WhenAll(urlTester.RunTestAsync());

        Console.WriteLine("Test is stoped");
        Console.ReadKey();
    }


    public class UrlTester
    {
        private HttpClient _httpClient;
        private string _url;

        public UrlTester(string url)
        {
            _httpClient = new HttpClient 
            { 
                Timeout = TimeSpan.FromMinutes(1)
            };

            // Add headers
            _httpClient.DefaultRequestHeaders.Add("User-Agent", "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2490.80 Safari/537.36");
            _httpClient.DefaultRequestHeaders.Add("Accept-Encoding", "gzip,deflate,sdch");
            _httpClient.DefaultRequestHeaders.Add("Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8");
            _httpClient.DefaultRequestHeaders.Add("Accept-Language", "sv-SE,sv;q=0.8,en-US;q=0.6,en;q=0.4");

            _url = url;
        }

        public async Task RunTestAsync()
        {
            var httpRequestMsg = new HttpRequestMessage(HttpMethod.Get, _url);

            try
            {
                using (var response = await _httpClient.SendAsync(httpRequestMsg, HttpCompletionOption.ResponseHeadersRead))
                {
                    Console.WriteLine("Response: {0}", response.StatusCode);
                }
            }
            catch (HttpRequestException e) 
            {
                Console.WriteLine(e.InnerException.Message);
            }
        }
    }

}
6
  • What exactly is the output you get from that code? Commented Nov 5, 2015 at 21:43
  • An HTTP request is an HTTP request; it shouldn't matter where it comes from (unless the server is blocking certain User-Agent headers, but even this can be changed). This does sound like a header issue to me. Have you verified that you are exactly reproducing the request sent from your browser? Have you used a tool like Fiddler to exactly capture the HTTP traffic, then replicated it in your code? Commented Nov 5, 2015 at 22:15
  • @pymaxion yes. I did what you saide. I used Fiddler to see how the header looks like in a successful http reqest. And then added those headers in the code. Even I got similar headers later, it still didn't work. Commented Nov 6, 2015 at 7:52
  • I suggest you run a network sniffer like Wireshark and see exactly what goes on. Maybe the async client also sends a Expect: 100-continue header? Commented Nov 6, 2015 at 10:00
  • @RonKlein Hi, I tried to add _client.DefaultRequestHeaders.ExpectContinue = false, still get 404. Commented Nov 6, 2015 at 10:42

2 Answers 2

9

This appears to be an issue with the accepted languages. I got a 200 response when using the following Accept-Language header value

_httpClient.DefaultRequestHeaders.Add("Accept-Language", "en-GB,en-US;q=0.8,en;q=0.6,ru;q=0.4");

enter image description here

p.s. I assume you know in your example _client should read _httpClient in the urlTester constructor or it wont build.

Sign up to request clarification or add additional context in comments.

Comments

0

Another possible cause of this problem is if the url you are sending is over approx 2048 bytes long. At that point the content (almost certainly the query string) can become truncated and this in turn means that it may not be matched correctly with a server side route.

Although these urls were processed correctly in the browser, they also failed using the get command in power shell.

This issue was resolved by using a POST with key value pairs instead of using a GET with a long query string.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.