0

i am running into a similar issue as described here: Java Linux Nonblocking Socket Timeout Behavior

I have an application implemented with Java NIO. It keeps track of a bunch of sockets, and when they're ready for reading, my application will read in a loop (removed code and some logic for brevity):

        if (selkey.isReadable()) {
            int nread;
            while (true) {
                // read the header
                nread = mSocketChannel.read(mHeaderBuffer);
                if (nread == -1)
                    return;
                handle_message_header();
                // read the body
                nread = mSocketChannel.read(mPayloadBuffer);
                if (nread == -1)
                    return;
                handle_message_body();
            }
        }

But very, very rarely I receive a timeout exception in the first read():

    java.io.IOException: Connection timed out
    at sun.nio.ch.FileDispatcher.read0(Native Method)
    at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
    at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:202)
    at sun.nio.ch.IOUtil.read(IOUtil.java:175)
    at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:243)

I digged into the jdk sources, and the read0 function simply calls read() on the socket handle. The "Connection timed out" exception is thrown if read() returns -1 and errno == ETIMEDOUT.

We do not use soSetTimeout() or the tcp keepalive option. And since i was seeing this only on a client's cluster i am not able to reproduce it (nor do i have the output of netstat or other tools).

I wonder in which cases does the linux kernel return ETIMEDOUT in a nonblocking read()? Is this a bug or a feature?

More information about the machine on which this appeared:

Linux slave1 2.6.18-164.e15 #1 SMP Thu Sep 3 03:28:30 EDT 2009 x86_64 x86_64 x86_64 GNU/Linux
CentOS 5.4

Thanks Chris

Edit: According to my log file (and the program flow), the socket was created when the server accepted an incoming connection. Then there was at least one successful recv from that socket, but twice the server then failed to write. And then i caught the exception when reading. The log file does not have much information - i am therefore not 100% sure about my analysis so far. I have added lots of debug output to the socket routines and now i am better prepared for the next time.

Thanks for all the helpful comments!

2 Answers 2

2

You are reading from a connection that you haven't completed properly. Probably you did the connect in non-blocking mode and you either haven't received the OP_CONNECT event; you haven't called finishConnect(); or it didn't return true.

Sign up to request clarification or add additional context in comments.

5 Comments

is it possible that OP_CONNECT was missing, but select() already claims that the socket is readable?
@cruppstahl Readability isn't meaningful until after connect.
yes, i know, but looking at my code this is what happens. The NIO selector returns "Readable" and therefore read() is called, and this times out. I assume (but did not verify) that the selector uses select(). I will look into that.
@cruppstahl Did you (1) register for OP_CONNECT and nothing else (2) get that event (3) call finishConnect() (4) finishConnect() returned true (5) register for OP_READ?
@EJP - i noticed that according to the log file the server failed twice in sending a response to the client. And then it caught the timeout exception. To me that means that the socket was working fine (it was at least reading successfully), which means that it also connected fine, and select() signals that the socket is ready for reading. However, the log really does not contain much information. I increased the debug output and if the error comes up again then i should have more information. Thanks for all your help!
0

Your client attempted to connect but received no response and eventually timed out.

EJP, Thank you for the correction.

4 Comments

No it doesn't. It means the connection was never completed in the first place.
"never completed" means it was not connected? That's not true. The connection was open and data was received.
@cruppstahl I find that very difficult to believe. I suggest you aren't observing your symptoms accurately.
That's possible - i have only the logfile for analysis. Thanks for your suggestion, i'll look into it.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.