4

I'm just starting out with Python web data in Python 3.6.1. I was learning sockets and I had a problem with my code which I couldn't figure out. The website in my code works fine, but when I run this code I get a 400 Bad Request error. I am not really sure what the problem with my code is. Thanks in advance.

import socket

mysock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)

mysock.connect(('data.pr4e.org', 80))

mysock.send(('GET http://data.pr4e.org/romeo.txt HTTP/1.0 \n\n').encode())

while True:
    data = mysock.recv(512)
    if ( len(data) < 1 ):
        break
    print (data)

mysock.close()
5
  • 404 technically speaking is not a bad request code Commented Jun 27, 2017 at 6:32
  • You are requesting some thing that either does not exist or the path is wrong. Check if the text file is present at the path you requested Commented Jun 27, 2017 at 6:34
  • Infact, i tried your code, it is working absolutely fine, can you show us the traceback? Commented Jun 27, 2017 at 6:37
  • whoops my bad its 400 @mehulmpt Commented Jun 27, 2017 at 6:47
  • <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"> <html><head> <title>400 Bad Request</title> </head><body> <h1>Bad Request</h1> <p>Your browser sent a request that this server could not understand.<br /> </p> <hr> <address>Apache/2.4.7 (Ubuntu) Server at do1.dr-chuck.com Port 80</address> </body></html> @be_good_do_good this is the message I receive Commented Jun 27, 2017 at 6:51

4 Answers 4

7
GET http://data.pr4e.org/romeo.txt HTTP/1.0 \n\n

Welcome in the wonderful world of HTTP where most users think that this is an easy protocol since it is a human readable but in fact it can be a very complex protocol. Given your request above there are several problems:

  • The path should be not a full URL but only /romeo.txt. Full URL's will be used only when doing a request to a proxy.
  • The line end must be \r\n not \n.
  • There should be no space after HTTP/1.0 before the end of the line.
  • While a Host header is only required with HTTP/1.1 many servers (including the one you are trying to access) need it also with HTTP/1.0 since they have multiple hostnames on the same IP address and need to distinguish which name you want.

With this in mind the data you send should be instead

GET /romeo.txt HTTP/1.0\r\nHost: data.pr4e.org\r\n\r\n

And I've tested that it works perfectly with this modification.

But, given that HTTP is not as simple as it might look I really recommend to use a library like requests for accessing the target. If this looks like too much overhead to you please study the HTTP standard to implement it properly instead of just guessing how HTTP works from some examples - and guessing it wrong.

Note also that servers differ in how forgiving they are regarding broken implementations like yours. Thus, what once worked with one server might not work with the next server or even with the same server after some software upgrade. Using a robust and well tested and maintained library instead of doing everything on your own might thus save you lots of troubles later too.

Sign up to request clarification or add additional context in comments.

Comments

3
'GET http://data.pr4e.org/romeo.txt HTTP/1.0\r\n\r\n'.encode()

works for me.

Comments

1

You don't send the protocol to the Web server, and you only send the hostname separately in a Host header, and only then in HTTP 1.1.

For HTTP 1.0, it should be:

mysock.send('GET /romeo.txt HTTP/1.0\r\n\r\n')

Alternatively, you could try sending an HTTP 1.1 request:

mysock.send('GET /romeo.txt HTTP/1.1\r\n')
mysock.send('Host: data.pr4e.org\r\n\r\n')

4 Comments

Thanks. I tried that but I still seemed to get the 400 Bad Request error, in the HTML code
Try changing the carriage returns to \r\n pairs as I have done above.
UsingHTTP/1.1 with above code to read the body is a bad idea which can be seen when actually testing it with the specific server. Since HTTP/1.1 implicitly enables HTTP keep-alive the server will not immediately close the connection after the body was sent as expected by the code but will wait for the next request in the same TCP connection - i.e. the program hangs. And even if an explicit Connection: close would be added to disable this the server might still use Transfer-encoding chunked with HTTP/1.1 which is not expected either. Thus, better stay with HTTP/1.0 to keep it simple.
"...a Host header, and only then in HTTP 1.1." - The Host header is actually allowed with HTTP/1.0 although not explicitly required by the standard as with HTTP/1.1. Still, many servers like the one in question need it and will accept it within a HTTP/1.0 request too.
0

This code worked for me:

GET http://data.pr4e.org/romeo.txt HTTP/1.0\r\n\r\n
  1. change \n\n to \r\n\r\n
  2. remove the space between HTTP/1.0 and \r\n\r\n

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.