Python HTTP GET . "Wrong Request"

Question

I am trying to write a code that gets the html code from a website that the user enters. I am required to write this without using urllib or other libraries of that sort.

 from socket import *


url = (input("Please enter url: "))
host=gethostbyname(url)

clientSocket = socket(AF_INET, SOCK_STREAM)
clientSocket.connect((host,80))

clientSocket.send(("GET " + host + "HTTP/1.1\n\n").encode("UTF-8"))

file = clientSocket.recv(1024)
print("The html code: ", file.decode("UTF-8"))
clientSocket.close()

The code runs fine. However, when I input a website such as "www.stackoverflow.com" I get a "bad request" response from the host:

The html code:  HTTP/1.1 400 Bad Request

Date: Wed, 23 Mar 2016 16:14:27 GMT

Content-Type: text/html

Content-Length: 177

Connection: close

Server: -nginx

CF-RAY: -



<html>

<head><title>400 Bad Request</title></head>

<body bgcolor="white">

<center><h1>400 Bad Request</h1></center>

<hr><center>cloudflare-nginx</center>

</body>

</html>

What would be the correct request in order to get the actual html code from the server. Thank you

Kurt Stutsman · Accepted Answer · 2016-03-23 16:28:27Z

1

A hostname is not a URL. Your script appears to be prompting for only a hostname since you're using gethostbyname(). The GET request expects to see a URI for its first argument. You also need to send carriage returns with your line feeds and you need two to terminate the GET request. You should something like:

clientSocket.send(("GET / HTTP/1.1\r\n\r\n").encode("UTF-8"))

Also if all you want to do is download a URL, use a library like urllib2 which takes care of all the HTTP protocol details for you. For example:

import urllib2

r = urllib2.urlopen('http://google.com/')
print r.read()

edited Mar 23, 2016 at 16:28

answered Mar 23, 2016 at 16:22

Kurt Stutsman

4,05420 silver badges24 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Antti Haapala · Accepted Answer · 2016-03-23 16:26:40Z

0

You're not speaking HTTP/1.1, yet you're stating so on the first line.

First of all, the token following GET must be an absolute path on the server; thus start with /.

Second, a HTTP/1.1 request must include the Host: header.

And third, your simple client should probably say Connection: close since it does not handle chunked connections.

You might have better success with the following script:

from socket import *

host = gethostbyname('stackoverflow.com')
clientSocket = socket(AF_INET, SOCK_STREAM)
clientSocket.connect((host,80))
clientSocket.send((
    "GET / HTTP/1.1\r\n"
    "Host: stackoverflow.com\r\n"
    "Connection: close\r\n\r\n").encode('utf-8'))

file = clientSocket.recv(1024)
print("The html code: ", file.decode("UTF-8"))
clientSocket.close()

answered Mar 23, 2016 at 16:26

Antti Haapala

135k23 gold badges297 silver badges349 bronze badges

4 Comments

JulianP Over a year ago

Thank you! however, my professor is requesting for the user to input the url instead of me having it there in the first place. This is where I am having issues because different sites have different paths and I wouldn't know how to generalize it.

Antti Haapala Over a year ago

then use urlparse to parse it to components

JulianP Over a year ago

excuse my ignorance but I'm not sure how to make that work. I'm only in intro to networking and my professor is not being very helpful. Everything I've done so far I've gotten it by my own research but I feel like I'm at a road block because I don't know much.

JulianP Over a year ago

UPDATE: I was able to get it. '("GET / HTTP/1.1\r\n" "Host:"+ url +"\r\n" "Connection: close\r\n\r\n")'

Collectives™ on Stack Overflow

Python HTTP GET . "Wrong Request"

2 Answers 2

Comments

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related