I've been working with .NET since its inception and doing parallel programming long before... still, I'm at a loss to explain this phenomenon. This code runs in a production system and has been performing its job for the most part, just seeking better understanding.
I'm passing in 10 URLs for concurrent processing into the following:
public static void ProcessInParellel(IEnumerable<ArchivedStatus> statuses,
StatusRepository statusRepository,
WaitCallback callback,
TimeSpan timeout)
{
List<ManualResetEventSlim> manualEvents = new List<ManualResetEventSlim>(statuses.Count());
try
{
foreach (ArchivedStatus status in statuses)
{
manualEvents.Add(new ManualResetEventSlim(false));
ThreadPool.QueueUserWorkItem(callback,
new State(status, manualEvents[manualEvents.Count - 1], statusRepository));
}
if (!(WaitHandle.WaitAll((from m in manualEvents select m.WaitHandle).ToArray(), timeout, false)))
throw ThreadPoolTimeoutException(timeout);
}
finally
{
Dispose(manualEvents);
}
}
The callback is something like:
public static void ProcessEntry(object state)
{
State stateInfo = state as State;
try
{
using (new LogTimer(new TimeSpan(0, 0, 6)))
{
GetFinalDestinationForUrl(<someUrl>);
}
}
catch (System.IO.IOException) { }
catch (Exception ex)
{
}
finally
{
if (stateInfo.ManualEvent != null)
stateInfo.ManualEvent.Set();
}
}
Each of the callbacks looks at a URL and follows a series of redirects (AllowAutoRedirect is intentionally set to false to handle cookies):
public static string GetFinalDestinationForUrl(string url, string cookie)
{
if (!urlsToIgnore.IsMatch(url))
{
HttpWebRequest request = (HttpWebRequest)HttpWebRequest.Create(url);
request.AllowAutoRedirect = false;
request.AutomaticDecompression = DecompressionMethods.Deflate | DecompressionMethods.GZip;
request.Method = "GET";
request.KeepAlive = false;
request.Pipelined = false;
request.Timeout = 5000;
if (!string.IsNullOrEmpty(cookie))
request.Headers.Add("cookie", cookie);
try
{
string html = null, location = null, setCookie = null;
using (WebResponse response = request.GetResponse())
using (Stream stream = response.GetResponseStream())
using (StreamReader reader = new StreamReader(stream))
{
html = reader.ReadToEnd();
location = response.Headers["Location"];
setCookie = response.Headers[System.Net.HttpResponseHeader.SetCookie];
}
if (null != location)
return GetFinalDestinationForUrl(GetAbsoluteUrlFromLocationHeader(url, location),
(!string.IsNullOrEmpty(cookie) ? cookie + ";" : string.Empty) + setCookie);
return CleanUrl(url);
}
catch (Exception ex)
{
if (AttemptRetry(ex, url))
throw;
}
}
return ProcessedEntryFlag;
}
I have a high-precision StopWatch around the recursive GetFinalDestinationForUrl call with a threshold of 6 seconds, and generally the callbacks that complete do so within this time.
However, the WaitAll, with a generous timeout of (0,0,60) for the 10 threads, still timesout regularly.
The exception prints out something like:
System.Exception: Not all threads returned in 60 seconds: Max Worker:32767, Max I/O:1000, Available Worker:32764, Available I/O:1000 at Work.Threading.ProcessInParellel(IEnumerable`1 statuses, StatusRepository statusRepository, WaitCallback callback, TimeSpan timeout) at Work.UrlExpanderWorker.SyncAllUsers()
This is running on .NET 4 with maxConnections set to 100 for all urls.
My only theory is that it's possible for a synchronous HttpWebRequest call to block longer than the specified timeout? That's the only reasonable explanation. The question is why and how best to force a real timeout on that operation?
Yes, I'm aware that the recursive call specifies a timeout of 5s on each call but that it could take several calls to process a given URL. But I hardly ever see the StopWatch warnings. For every 20-30 WaitAll timeout errors that I see, I might see one message indicating that a given thread took longer than 6 seconds. If the problem really is that the 10 threads cumulatively need more than 60 seconds, then I should see at least a 1:1 correlation (if not higher) between the messages.
UPDATE (March 30, 2012):
I'm able to confirm that the network calls alone don't respect the timeouts in certain circumstances:
Uri uri = new Uri(url);
HttpWebRequest request = (HttpWebRequest)HttpWebRequest.Create(uri);
request.AllowAutoRedirect = false;
request.AutomaticDecompression = DecompressionMethods.Deflate | DecompressionMethods.GZip;
request.Method = "GET";
request.KeepAlive = false;
request.Pipelined = false;
request.Timeout = 7000;
request.CookieContainer = cookies;
try
{
string html = null, location = null;
using (new LogTimer("GetFinalDestinationForUrl", url, new TimeSpan(0, 0, 10)))
using (HttpWebResponse response = (HttpWebResponse)request.GetResponse())
using (Stream stream = response.GetResponseStream())
using (StreamReader reader = new StreamReader(stream))
{
html = reader.ReadToEnd();
location = response.Headers["Location"];
cookies = Combine(cookies, response.Cookies);
if (response.ContentLength > 150000 && !response.ContentType.ContainsIgnoreCase("text/html"))
log.Warn(string.Format("Large request ({0} bytes, {1}) detected at {2} on level {3}.", response.ContentLength, response.ContentType, url, level));
}
That code routinely logs entries that took 5-6 minutes to complete AND were not larger than 150000. And I'm not talking about an isolated server here or there, these are random (high-profile) media sites.
What exactly is going on here and how do we ensure that the code exits in a reasonable time?
m.WaitHandle. InProcessEntryyou usestateInfo.ManualEvent. Is that a typo?