2

I'm using Python to making HTTP requests. I need to raw HTTP response that looks like this:

HTTP/1.1 200 OK
Date: Mon, 19 Jul 2004 16:18:20 GMT
Server: Apache
Last-Modified: Sat, 10 Jul 2004 17:29:19 GMT
ETag: "1d0325-2470-40f0276f"
Accept-Ranges: bytes
Content-Length: 9328
Connection: close
Content-Type: text/html

<HTML>
<HEAD>
... the rest of the home page...

In python-requests I tried response.raw, but it's NOT raw HTTP response and it's just raw body.

Is there any way to achieve this goal without using socket?

P.S. I don't want to rebuild the raw response using parsed parts.

11
  • So what do you understand the 'raw response' to be? The header section? That's not available in raw form. Commented Apr 22, 2019 at 14:00
  • @MartijnPieters the format I've mentioned above. Commented Apr 22, 2019 at 14:02
  • So you just need the HTTP headers, and not the body, correct? Commented Apr 22, 2019 at 14:03
  • @MarkStewart No. I need to all of response in the mentioned format. Commented Apr 22, 2019 at 14:04
  • @JudaXovex: that's the status line, the headers, and the body. The status line and headers are not available in raw form. Commented Apr 22, 2019 at 14:09

1 Answer 1

1

requests doesn't have the status line and headers in raw form. You never need these in raw form, a RFC compliant response can be reconstructed trivially from the data you do have. requests uses the urllib3 library as its basis, and that library, in turn, uses the Python standard library http.client module. That module doesn't give you the raw data either.

Instead, the status line and headers are parsed directly into the constituent parts, in http.client.HTTPResponse._read_status() and http.client.parse_headers() (the latter delegating to the email.parser.Parser().parsestr() method to parse the headers into a http.client.HTTPMessage() instance). Only the results of these parse operations are used.

You could try to wrap the urllib3 connection object (via the get_connection() hook implemented on a requests transport adapter). Connection objects have a .connect() method with supporting methods that create socket objects, and if you were to wrap those in a file-like object and then peeked at the .readline() call data, you could capture and store the raw data there.

However, if you are debugging a broken HTTP server, I'd not bother with trying to bend requests and its stack to your will here. Just use curl --include --raw <url> on the command line instead (with perhaps --verbose added).

Another option would be to use the http.client library directly, make the connection, send your outgoing headers with HTTPConnection.request(), then not use getresponse() but just read directly from conn.sock.

Sign up to request clarification or add additional context in comments.

8 Comments

You say You never need these in raw form, but I need, because I need to analyze HTTP response format.
@JudaXovex: then response may not be the library for your needs. Or any other library based on http.client.
Is there any alternative library?
@JudaXovex: why not just use the curl command line?
There is too many requests and executing an external command is not too fast.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.