1

Say I have the following HTTP request:

GET /4 HTTP/1.1
Host: graph.facebook.com

And the server returns the following response:

HTTP/1.1 200 OK
Access-Control-Allow-Origin: *
Cache-Control: private, no-cache, no-store, must-revalidate
Content-Type: text/javascript; charset=UTF-8
ETag: "539feb8aee5c3d20a2ebacd02db380b27243b255"
Expires: Sat, 01 Jan 2000 00:00:00 GMT
Pragma: no-cache
X-FB-Rev: 1070755
X-FB-Debug: pC4b0ONpdhLwBn6jcabovcZf44bkfKSEguNsVKuSI1I=
Date: Wed, 08 Jan 2014 01:22:36 GMT
Connection: keep-alive
Content-Length: 172

{"id":"4","name":"Mark Zuckerberg","first_name":"Mark","last_name":"Zuckerberg","link":"http:\/\/www.facebook.com\/zuck","username":"zuck","gender":"male","locale":"en_US"}

Since the Content-Lengh header depends on the length of the content, I cannot simply split by the Content-Length: 172 string. How can I extract the JSON and headers separately? They are both important to my program. I am using this code to get the response:

s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect(("graph.facebook.com", 80))
s.send("GET /"+str(id)+"/picture HTTP/1.1\r\nHost: graph.facebook.com\r\n\r\n")
data = s.recv(1024)
s.close()
json_string = (somehow extract this)
userdata = json.loads(json_string)
5
  • 1
    a) Wouldn't that be \r\n\r\n b) I was looking to do this all in one line and a bit more gracefully But thanks for the suggestion Commented Jan 8, 2014 at 1:46
  • depends on your server os, but you can use the | operator. a quick google search reveals this Commented Jan 8, 2014 at 1:47
  • 1
    I would probably use the requests library and do somerequest.json().docs.python-requests.org/en/latest Commented Jan 8, 2014 at 1:48
  • @erewok is this supported in python 2.7 too? Commented Jan 8, 2014 at 1:51
  • 1
    @735Tesla: requests is supported on Python 2.7, but it's a third-party install. And there is absolutely no need for it here; urllib2 in the stdlib will be just as easy for your use. Commented Jan 8, 2014 at 1:56

1 Answer 1

5

The easy way to do this is to use an HTTP library. For example:

import json
import urllib2

r = urllib2.urlopen("http://graph.facebook.com/{}/picture".format(id))
json_string = r.read()
userdata = json.loads(json_string)

If you really want to parse it yourself, the HTTP protocol guarantees that headers and body are separated by an empty line, and that this will be the first empty line anywhere in the response, so it's not that hard:

data = s.recv(1024)
header, _, json_string = data.partition('\r\n\r\n')
userdata = json.loads(json_string)

There are some obvious down sides to this—as written, your code won't work if the response is longer than 1K, or if the kernel doesn't give you the whole response in a single recv (which it's never guaranteed to do), or if the server redirects you or gives you a 100 CONTINUE before the real response, or if the server decides to send back a chunked or MIME-multipart or other response instead of a flat body, or…

Sign up to request clarification or add additional context in comments.

4 Comments

What is the purpose of the , _, in header, _,?
@735Tesla: str.partition returns three values: the part before the separator, the separator, and the part after the separator. Often you don't need the middle one (you know it's just going to be '\r\n\r\n' here…). Assigning don't-care values to _ is a common idiom in Python—just readable enough that you can tell there's a value there, but unobtrusive enough to signal that the value doesn't matter beyond noting its existence.
Thanks I never heard of using _ that way before. +1
This is much better than my answer. +1

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.