C# Webclient returning error 404

Question

I'm using below script to retrieve HTML from an URL.

string webURL = @"https://nl.wiktionary.org/wiki/" + word.ToLower();
                using (WebClient client = new WebClient())
                {
                      string htmlCode = client.DownloadString(webURL);                
                }

The variable word can be any word. In case there is no WIKI page for the "word" be retrieved the code is ending in error with code 404, while retrievng the URL with a browser opens a WIKI page, saying there is no page for this item yet.

What I want is that the code always gets the HTML, also when the WIKI page says there is no info yet. I do not want to avoid the error 404 with a try and catch.

Does anyone has an idea why this is not working with a Webclient?

a little bit off topic but: why not HttpClient instead of WebClient — Alex
– Alex, Commented Jul 7, 2017 at 13:07

Jean Claude ADIBA · Accepted Answer · 2017-07-07 13:25:37Z

3

try this. You can catch the 404 error content in a try catch block.

        var word = Console.ReadLine();
        string webURL = @"https://nl.wiktionary.org/wiki/" + word.ToLower();
        using (WebClient client = new WebClient() {  })
        {
            try
            {

                string htmlCode = client.DownloadString(webURL);

            }
            catch (WebException exception)
            {
                string responseText=string.Empty;

                var responseStream = exception.Response?.GetResponseStream();

                if (responseStream != null)
                {
                    using (var reader = new StreamReader(responseStream))
                    {
                        responseText = reader.ReadToEnd();
                    }
                }

                Console.WriteLine(responseText);
            }
        }

        Console.ReadLine();

answered Jul 7, 2017 at 13:25

Jean Claude ADIBA

13311 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Peter Over a year ago

Note that WebException.Response by default is limited to 64 kB. If you need to read more, you need to set HttpWebRequest.DefaultMaximumErrorResponseLength. (Thanks to stackoverflow.com/a/43842761/72809)

Pavel Dmitrenko · Accepted Answer · 2017-07-07 13:41:59Z

0

Since this WIKI-server use case-sensitive url mapping, just don't modify case of URL to harvest (remove ".ToLower()" from you code).

Ex.: Lower case:
https://nl.wiktionary.org/wiki/categorie:onderwerpen_in_het_nynorsk
Result: HTTP 404(Not Found)

Normal (unmodified) case:
https://nl.wiktionary.org/wiki/Categorie:Onderwerpen_in_het_Nynorsk
Result: HTTP 200(OK)

Also, keep in mind what most (if not all) WiKi servers (including this one) generates custom 404 pages, so in browser they looks like "normal" pages, but, despite this, they are serving with 404 http code.

edited Jul 7, 2017 at 13:41

answered Jul 7, 2017 at 13:30

Pavel Dmitrenko

3102 silver badges7 bronze badges

1 Comment

HB1963 Over a year ago

Thx Pavel, so it looks like I will have to use a "try/cach" after all.

Collectives™ on Stack Overflow

C# Webclient returning error 404

2 Answers 2

1 Comment

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related