I'm trying to create a web-scraper that queries a lot of urls in parallel and waits for their responses using Task.WhenAll(). However if one of the Tasks are unsuccessful, WhenAll fails. I am expecting many of the Tasks to return a 404 and wish to handle or ignore those. For example:
string urls = Enumerable.Range(1, 1000).Select(i => "https://somewebsite.com/" + i));
List<Task<string>> tasks = new List<Task<string>>();
foreach (string url in urls)
{
tasks.Add(Task.Run(() => {
try
{
return (new HttpClient()).GetStringAsync(url);
}
catch (HttpRequestException)
{
return Task.FromResult<string>("");
}
}));
}
var responseStrings = await Task.WhenAll(tasks);
This never hits the catch statement, and WhenAll fails at the first 404. How can I get WhenAll to ignore exceptions and just return the Tasks that completed successfully? Better yet, could it be done somewhere in the code below?
var tasks = Enumerable.Range(1, 1000).Select(i => (new HttpClient()).GetStringAsync("https://somewebsite.com/" + i))));
var responseStrings = await Task.WhenAll(tasks);
Thanks for your help.