31

How can I get python to get the contents of an HTTP page? So far all I have is the request and I have imported http.client.

5 Answers 5

56

Using urllib.request is probably the easiest way to do this:

import urllib.request
f = urllib.request.urlopen("http://stackoverflow.com")
print(f.read())
Sign up to request clarification or add additional context in comments.

4 Comments

Tried that and I got "AttributeError: 'module' object has no attribute 'urlopen'"
Sorry, I just noticed that you were using Python 3. I've updated my example to match.
@Davide Gualano: The Python 2.x urllib2 module has been rolled into the Python 3.x urllib set of modules: docs.python.org/library/urllib2.html
@Greg: my bad, I didn't read the question title carefully enough :)
14

Usage built-in module "http.client"

import http.client

connection = http.client.HTTPSConnection("api.bitbucket.org", timeout=2)
connection.request('GET', '/2.0/repositories')
response = connection.getresponse()
print('{} {} - a response on a GET request by using "http.client"'.format(response.status, response.reason))
content = response.read().decode('utf-8')
print(content[:100], '...')

Result:

200 OK - a response on a GET request by using "http.client" {"pagelen": 10, "values": [{"scm": "hg", "website": "", "has_wiki": true, "name": "tweakmsg", "links ...

Usage third-party library "requests"

response = requests.get("https://api.bitbucket.org/2.0/repositories")
print('{} {} - a response on a GET request by using "requests"'.format(response.status_code, response.reason))
content = response.content.decode('utf-8')
print(content[:100], '...')

Result:

200 OK - a response on a GET request by using "requests" {"pagelen": 10, "values": [{"scm": "hg", "website": "", "has_wiki": true, "name": "tweakmsg", "links ...

Usage built-in module "urllib.request"

response = urllib.request.urlopen("https://api.bitbucket.org/2.0/repositories")
print('{} {} - a response on a GET request by using "urllib.request"'.format(response.status, response.reason))
content = response.read().decode('utf-8')
print(content[:100], '...')

Result:

200 OK - a response on a GET request by using "urllib.request" {"pagelen": 10, "values": [{"scm": "hg", "website": "", "has_wiki": true, "name": "tweakmsg", "links ...

Notes:

  1. Python 3.4
  2. Result from the responses most likely will be differ only content

Comments

3

You can also use the requests library. I found this particularly useful because it was easier to retrieve and display the HTTP header.

import requests

source = 'http://www.pythonlearn.com/code/intro-short.txt'

r = requests.get(source)

print('Display actual page\n')
for line in r:
    print (line.strip())

print('\nDisplay all headers\n')
print(r.headers)

1 Comment

Is this Python 3?
1

pip install requests

import requests

r = requests.get('https://api.spotify.com/v1/search?type=artist&q=beyonce')
r.json()

Comments

0

Add this code which can format data for human reading:

text = f.read().decode('utf-8')

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.