16

I'm trying to parse an HTTP request line (e.g. GET / HTTP/1.1\r\n), which is easy with socket.makefile().readline() (BaseHTTPRequestHandler uses it), like:

print sock.makefile().readline()

unfortunately, as the documentation says, when using makefile() the socket must be in blocking mode (it can not have a timeout); how can I implement a readline()-like function that does the same without using makefile() file object interface and not reading more than needed (as it'd discard data I will need after)?

a pretty inefficient example:

request_line = ""
while not request_line.endswith('\n'):
    request_line += sock.recv(1)
print request_line 

4 Answers 4

10

Four and a half years later, I would suggest asyncio's Streams for this, but here's how you might do it properly using BytesIO

Note that this implementation "shrinks" the in-memory BytesIO object each time a line is detected. If you didn't care about that, this could be a lot fewer lines.

import socket
import time
from io import BytesIO

sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.connect(('localhost', 1234))
sock.setblocking(False)


def handle_line(line):
    # or, print("Line Received:", line.decode().rstrip())
    print(f"Line Received: {line.decode().rstrip()!r}")


with BytesIO() as buffer:
    while True:
        try:
            resp = sock.recv(100)       # Read in some number of bytes -- balance this
        except BlockingIOError:
            print("sleeping")           # Do whatever you want here, this just
            time.sleep(2)               #   illustrates that it's nonblocking
        else:
            buffer.write(resp)          # Write to the BytesIO object
            buffer.seek(0)              # Set the file pointer to the SoF
            start_index = 0             # Count the number of characters processed
            for line in buffer:
                start_index += len(line)
                handle_line(line)       # Do something with your line

            """ If we received any newline-terminated lines, this will be nonzero.
                In that case, we read the remaining bytes into memory, truncate
                the BytesIO object, reset the file pointer and re-write the
                remaining bytes back into it.  This will advance the file pointer
                appropriately.  If start_index is zero, the buffer doesn't contain
                any newline-terminated lines, so we set the file pointer to the
                end of the file to not overwrite bytes.
            """
            if start_index:
                buffer.seek(start_index)
                remaining = buffer.read()
                buffer.truncate(0)
                buffer.seek(0)
                buffer.write(remaining)
            else:
                buffer.seek(0, 2)

(The original answer was so bad that it wasn't worth keeping (I promise), but should be available in the edit history).

Sign up to request clarification or add additional context in comments.

3 Comments

There are a couple of problems with this answer. First, StringIO doesn't take a buffer size as an argument; the construct above will result in a buffer that contains the literal contents 2048 (that is, a four character string). This will also discard any data received after the first newline is seen.
what if the socket recieves something like this -> "<msg1>\n<ms"? (The 2nd message is half-assed)
works good and efficient, with one minor modification (at least for me). I needed to insert a check for a newline present in the line, because the for line in buffer: statement did return the rest of the buffer, even if it didn't contain a newline. So the for statement looked like this for me, to have it working: for line in buffer: if line.endswith(b'\n'): start_index += len(line) handle_line(line)
6

SocketStreamReader

Here is a (buffered) line-reader that does not use asyncio. It can be used as a "synchronous" socket-based replacement for asyncio.StreamReader.

import socket
from asyncio import IncompleteReadError  # only import the exception class


class SocketStreamReader:
    def __init__(self, sock: socket.socket):
        self._sock = sock
        self._recv_buffer = bytearray()

    def read(self, num_bytes: int = -1) -> bytes:
        raise NotImplementedError

    def readexactly(self, num_bytes: int) -> bytes:
        buf = bytearray(num_bytes)
        pos = 0
        while pos < num_bytes:
            n = self._recv_into(memoryview(buf)[pos:])
            if n == 0:
                raise IncompleteReadError(bytes(buf[:pos]), num_bytes)
            pos += n
        return bytes(buf)

    def readline(self) -> bytes:
        return self.readuntil(b"\n")

    def readuntil(self, separator: bytes = b"\n") -> bytes:
        if len(separator) != 1:
            raise ValueError("Only separators of length 1 are supported.")

        chunk = bytearray(4096)
        start = 0
        buf = bytearray(len(self._recv_buffer))
        bytes_read = self._recv_into(memoryview(buf))
        assert bytes_read == len(buf)

        while True:
            idx = buf.find(separator, start)
            if idx != -1:
                break

            start = len(self._recv_buffer)
            bytes_read = self._recv_into(memoryview(chunk))
            buf += memoryview(chunk)[:bytes_read]

        result = bytes(buf[: idx + 1])
        self._recv_buffer = b"".join(
            (memoryview(buf)[idx + 1 :], self._recv_buffer)
        )
        return result

    def _recv_into(self, view: memoryview) -> int:
        bytes_read = min(len(view), len(self._recv_buffer))
        view[:bytes_read] = self._recv_buffer[:bytes_read]
        self._recv_buffer = self._recv_buffer[bytes_read:]
        if bytes_read == len(view):
            return bytes_read
        bytes_read += self._sock.recv_into(view[bytes_read:])
        return bytes_read

Usage:

reader = SocketStreamReader(sock)
line = reader.readline()

Comments

2

In Python 3, instead of:

print sock.makefile().readline()

I discovered:

socket.SocketIO(sock).readline()

Thanks to this answer.

2 Comments

Probably the most straight forward and pythonic answer, instead of reimplementing all that basics.
Despite lacking an underscore, SocketIO is undocumented, so it could disappear at any time. Looking at the source code (CPython 3.13.1), it's used internally in the implementation of makefile. A comment in the source code indicates that it does work with a timeout. So you should use makefile.
-1

Here is my solution written in Python 3. In the example I use io.BytesIO.read() instead of socket.recv() but the idea is the same

CHUNK_SIZE = 16  # you can set it larger or smaller
buffer = bytearray()
while True:
  chunk = stream.read(CHUNK_SIZE)
  buffer.extend(chunk)
  if b'\n' in chunk or not chunk:
    break
firstline = buffer[:buffer.find(b'\n')]

However, the rest of the message is partially in the buffer and partially waiting in the socket. You can either keep writing the content into the buffer and read from the buffer to have the entire request in one piece (it should be fine unless you parse a huge requests) or you can wrap it with a generator and read it part by part

def reader(buffer, stream):
  yield buffer[buffer.find(b'\n') + 1:]
  while True:
    chunk = stream.read(2048)
    if not chunk: break
    yield chunk

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.