11

I'm experiencing the so-called Timeout performing HGET company:product:settings, inst: 1, queue: 8, qu=0, qs=8, qc=0, wr=0/0, in=79/1 timeout exception.

It's strange because the same Redis instance and in the same machine is storing data, but it's a specific application which throws this exception. Update: In fact, the same application, one line above receives data from Redis. The issue is with HGET.

Also, I've increased timeouts on multiplexer configuration to 6 seconds with no luck.

In addition, I've checked that IDatabase instance has IsConnected with true value.

How to interpret these error messages and what's the issue behind the whole timeout?

Some background...

I've successfully fixed the issue changing when some code section gets a database (i.e. multiplexer.GetDatabase()).

While multiplexer has an instance per AppDomain as described in StackExchange.Redis documentation, many inversion of control components are creating many instances of IDatabase in their own code. That is, IDatabase instance isn't shared.

Actual code is performing a ListRightPopLeftPush, and after that, it's instantiating an inversion of control component which reads a hash key during component instatiation. If instantiate the whole component before doing the so-called ListRightPopLeftPush, then the whole HashGet won't throw the timeout exception.

It seems like even when ListRightPopLeftPush is executed from other IDatabase instance, it produces some kind of issue in the next IDatabase instance when it comes to perform read operations.

Anyway, my fix doesn't answer the question. I've just added more detailed info to let us find what's the issue and its own solution.

Update

Anyway, the above "fix" won't fix further read accesses to Redis. I'm getting the same timeout exception in further calls. And now in paramater found in exception's message says 60/1.

18
  • The in=79/1 tells me that there is data available and that a reader is processing it; I will need to check to see if there is any way the reader could be stalled here. Commented Aug 29, 2014 at 16:24
  • @MarcGravell Hey, and how you want to check that? Anyway, this is code at work, we should check that next monday... Commented Aug 29, 2014 at 18:32
  • @MarcGravell Nevermind, I understand you're going to check in some test ;) Commented Aug 29, 2014 at 18:36
  • @MarcGravell Did you get any clue about what's behind the issue? If you need further details you can ask for them and I'll add them to the question Commented Sep 1, 2014 at 7:19
  • @MarcGravell I've added new details because I could "fix" the issue changing some execution order in my code. Maybe this gives you more info... Commented Sep 1, 2014 at 7:50

1 Answer 1

7

Based on a long discussion in chat, and a lot of digging, it looks like in some obscure scenarios the TPL is hijacking the dedicated reader thread when we are doing things like .TrySetResult (which: we do often). This causes an instant deadlock if you make a synchronous call, since it can't possibly process any socket data if it is busy waiting for a task to complete (which would only ever be completed by itself). We do actually have code in place specifically to prevent this, but it looks like the workaround actually forces it to happen in some other scenarios. Which... is horrible. I will see what I can find. But basically, the problem is that currently, in some limited scenarios, TaskCompletionSource.TrySetResult is giving power to the TPL to run synchronous continuations. This includes Task.WhenAny.

Sign up to request clarification or add additional context in comments.

5 Comments

I don't know if I should mark this as solved or we can leave it open until you release a new version of StackExchange.Redis library solving the issue. What's your opinion?
if you want, it would be useful that your answer would explain that starting from X version the issue got solved!
I've got something like this happening in a Nancy app, and I'm not sure if this is to blame or not. Any updates on what the limited Scenarios are that can cause this to happen?
In case it's important this is happening for me in version 1.0.371 framework 4.5, and I'm not able to reproduce it in unit tests, just the Nancy App.
Does this mean we should or shouldn't be using async methods?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.