Avoiding TCP/IP connection hanging

Question

I am communicating with an instrument via TCP/IP using the Python socket package.

The program sends a command to the instrument to perform an action, and then repetitively sends another "check" command until it receives a "done" reply. However, after many loops, the program hangs while waiting for a "done" reply.

I have circumvented this problem by using the recv_timeout() function below, which returns no data if the socket is hanging, then I close the connection with socket.close() and reconnect.

Is there a more elegant solution without having to reboot anything?

    import socket
    import time

    def recv_timeout(self,timeout=0.5):
         '''
         code from http://code.activestate.com/recipes/408859/
         '''
         self.s.setblocking(0)
         total_data=[];data='';begin=time.time()
         while 1:There must be a way I can reboot to carry on communicating with the instrument, without having to restart.   
             #if you got some data, then break after wait sec
             if total_data and time.time()-begin>timeout:
                 break
             #if you got no data at all, wait a little longer
             elif time.time()-begin>timeout*2:
                 break
             try:
                 data=self.s.recv(8192)
                 if data:
                      total_data.append(data)
                      begin=time.time()
             else:
                 time.sleep(0.1)
             except:
                 pass
         return ''.join(total_data)

    sock = socket.socket(socket.AF_INET,socket.SOCK_STREAM)
    sock.connect(('555.555.55.555',23))

    for action_num in range(0,1000):
         socket.sendall(('performaction %s \r'%action_num).encode())

         while True:
              time.sleep(0.2) 
              socket.sendall(('checkdone \r').encode())
              done = socket.recv_timeout()  
              if not done:
                   print 'communication broken...what should I do?'
                   socket.close()
                   time.sleep(60)
                   sock = socket.socket(socket.AF_INET,socket.SOCK_STREAM)
                   sock.connect(('555.555.55.555',23))
              elif done == '1':
                   print 'done performing action'
                   break  
    socket.close()

Jeremy Friesner · Accepted Answer · 2015-05-21 06:00:28Z

I have circumvented this problem by using the recv_timeout() function below, which returns no data if the socket is hanging

Are you certain that the socket will hang forever? What about the possibility that the instrument just sometimes takes more than half a second to respond? (Note that even if the instrument's software is good at responding in a timely manner, that is no guarantee that the response data will actually get to your Python program in a timely manner. For example, if the TCP packets containing the response get dropped by the network and have to be resent, that could cause them to take more than .5 seconds to return to your program. You can force that scenario to occur by pulling the Ethernet cable out of your PC for a second or two, and then plugging it back in... you'll see that the response bytes still make it through, just a second or two later on (after the dropped packets get resent); that is, if your Python program hasn't given up on them and closed the socket already.

Is there a more elegant solution without having to reboot anything?

The elegant solution is to figure out what is happening to the reply bytes in the fault scenario, and fixing the underlying bug so that the reply bytes no longer get lost. WireShark can be very helpful in diagnosing where the fault is; for example if WireShark shows that the response bytes did enter your computer's Ethernet port, then that is a pretty good clue that the bug is in your Python program's handling of the incoming bytes(*). On the other hand if the response bytes never show up in WireShark, then there might be a bug in the instrument itself that causes it to fail to respond sometimes. Wireshark would also show you if the problem is that your Python script failed to send out the "check" command for some reason.

That said, if you really can't fix the underlying bug (e.g. because it's a bug in the instrument and you don't have the ability to upgrade the source code of the software running on the instrument) then the only thing you can do is what you are doing -- close the socket connection and reconnect. If the instrument doesn't want to respond for some reason, you can't force it to respond.

(*) One thing to do is print out the contents of the string returned by recv_timeout(). You may find that you did get a reply, but it just wasn't the '1' string you were expecting.

Collectives™ on Stack Overflow

Avoiding TCP/IP connection hanging

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related