I've written a miniature proxy module in Python 3 to simply sit between my browser and the web. My goal is to merely proxy the traffic going back and forth. One behavior of the program is to save the website responses I get in a local directory.
Everything works the way I expect, except for the simple fact that using socket.recv() in a loop seems to never yield the blank bytes object implied in the examples provided in the docs. Virtually every online example talks about the blank string coming through the socket when the server closes it.
My assumption is that something is going on via the keep-alive header, where the remote server never closes the socket unless its own timeout threshold is reached. Is this correct? If so, how on earth am I to detect when a payload is finished being sent? Observing that received data is smaller than my declared chunk size does not work at all, due to the way TCP functions.
To demonstrate, the following code opens a socket at an image file on Google's web server. I copied the actual request string from my browser's own requests. Running the code (remember, Python 3!) shows that binary image data is received to completion, but then the code never is capable of hitting the break statement. Only when the server closes the socket (after some 3 minutes of idle time) does this code actually reach the print command at the end of the file.
How on earth does one get around this? My goal is to not modify the behavior of my browser's requests—I don't want to have to set the keep-alive header to false or something gaudy like that. Is the answer to use some ugly timeouts (via socket.settimeout())? Seems laughable, but I don't know what else could be done.
Thanks in advance.
import socket
remote_host = 'www.google.com'
remote_port = 80
remote_socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
remote_socket.connect((remote_host, remote_port))
remote_socket.sendall(b'GET http://www.google.com/images/logos/ps_logo2a_cp.png HTTP/1.1\r\nHost: www.google.com\r\nCache-Control: max-age=0\r\nPragma: no-cache\r\nUser-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_0) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/14.0.794.0 Safari/535.1\r\nAccept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8\r\nAccept-Encoding: gzip,deflate,sdch\r\nAccept-Language: en-US,en;q=0.8\r\nAccept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.3\r\n\r\n')
content = b''
while True:
msg = remote_socket.recv(1024)
if not msg:
break
print(msg)
content += msg
print("DONE: %d" % len(content))