Python urllib2 Response header

Question

I'm trying to extract the response header of a URL request. When I use firebug to analyze the response output of a URL request, it returns:

Content-Type text/html

However when I use the python code:

urllib2.urlopen(URL).info()

the resulting output returns:

Content-Type: video/x-flv

I am new to python, and to web programming in general; any helpful insight is much appreciated. Also, if more info is needed please let me know.

Thanks in advance for reading this post

This seems like a duplicate of stackoverflow.com/questions/843392/… — Jay Taylor
– Jay Taylor, Commented Apr 26, 2012 at 20:15

syabro · Accepted Answer · 2011-10-26 13:31:09Z

40

Try to request as Firefox does. You can see the request headers in Firebug, so add them to your request object:

import urllib2

request = urllib2.Request('http://your.tld/...')
request.add_header('User-Agent', 'some fake agent string')
request.add_header('Referer', 'fake referrer')
...
response = urllib2.urlopen(request)
# check content type:
print response.info().getheader('Content-Type')

There's also HTTPCookieProcessor which can make it better, but I don't think you'll need it in most cases. Have a look at python's documentation:

http://docs.python.org/library/urllib2.html

edited Oct 26, 2011 at 13:31

syabro

1,8872 gold badges16 silver badges30 bronze badges

answered Mar 26, 2010 at 14:04

qingbo

2,18018 silver badges19 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Janus Troelsen Over a year ago

for Python 3: response.info()["content-type"]

Nearoo Over a year ago

Is it completely impossible for a site to check if a request has a fake referrer or not? I'm not lucky with what I try, there's always the error "Invalid referer, won't load xy"...

Nearoo Over a year ago

Also, if info() doesn't show a row "Referer": Can I conclude that the "fake referer" didn't work?

bobince · Accepted Answer · 2009-10-31 13:11:36Z

5

Content-Type text/html

Really, like that, without the colon?

If so, that might explain it: it's an invalid header, so it gets ignored, so urllib guesses the content-type instead, by looking at the filename. If the URL happens to have ‘.flv’ at the end, it'll guess the type should be video/x-flv.

answered Oct 31, 2009 at 13:11

bobince

538k111 gold badges675 silver badges846 bronze badges

Comments

Alex Martelli · Accepted Answer · 2009-10-31 06:21:52Z

2

This peculiar discrepancy might be explained by different headers (maybe ones of the accept kind) being sent by the two requests -- can you check that...? Or, if Javascript is running in Firefox (which I assume you're using when you're running firebug?) -- since it's definitely NOT running in the Python case -- "all bets are off", as they say;-).

answered Oct 31, 2009 at 6:21

Alex Martelli

887k175 gold badges1.3k silver badges1.4k bronze badges

3 Comments

looter Over a year ago

hmmm, I'm not too sure how there would be two different headers and also how I would be able to distinguish between both. I'm pretty sure javascript is running in firefox... What would be necessary for me to do within python then?

Alex Martelli Over a year ago

@looter, there's no direct way to execute Javascript in Python -- if Javascript's playing a crucial role in determining the final contents of the page, your best bet's automating real browsers instead, e.g. via SeleniumRC.

looter Over a year ago

I'm not sure if Javascript is processing the requests, because when I use the network monitoring in firebug, the response header is also viewable within the HTML view. Like I mentioned in my post, I'm really new to python and web programming/scripting so some of this is going over my head, I'm not sure if I'm being descriptive enough. Thanks for your help so far.

Ned Batchelder · Accepted Answer · 2009-10-31 13:16:55Z

1

Keep in mind that a web server can return different results for the same URL based on differences in the request. For example, content-type negotiation: the requestor can specify a list of content-types it will accept, and the server can return different results to try to accomodate different needs.

Also, you may be getting an error page for one of your requests, for example, because it is malformed, or you don't have cookies set that authenticate you properly, etc. Look at the response itself to see what you are getting.

answered Oct 31, 2009 at 13:16

Ned Batchelder

378k77 gold badges583 silver badges675 bronze badges

Comments

Yan Foto · Accepted Answer · 2015-11-29 12:47:19Z

0

according to http://docs.python.org/library/urllib2.html there is only get_header() method and nothing about getheader .

Asking because Your code works fine for

response.info().getheader('Set cookie')

but once i execute

response.info().get_header('Set cookie')

i get:

Traceback (most recent call last):
  File "baza.py", line 11, in <module>
    cookie = response.info().get_header('Set-Cookie')
AttributeError: HTTPMessage instance has no attribute 'get_header'

edit: Moreover
response.headers.get('Set-Cookie') works fine as well, not mentioned in urlib2 doc....

edited Nov 29, 2015 at 12:47

Yan Foto

11.5k6 gold badges61 silver badges98 bronze badges

answered Oct 10, 2012 at 12:02

modzello86

4338 silver badges16 bronze badges

1 Comment

Mark E. Haase Over a year ago

get_header() is for the urllib2.Request class. The response class uses getheader() instead, which is an unfortunate mismatch.

Freak · Accepted Answer · 2022-05-11 04:40:44Z

0

for getting raw data for the headers in python2, a little bit of a hack but it works.

"".join(urllib2.urlopen("http://google.com/").info().__dict__["headers"])

basically "".join(list) will the list of headers, which all include "\n" at the end.

__dict__ is a built in python variable for all dicts, basically you can select a list out of a 2d array with it.

and ofcourse ["headers"] is selecting the list value from the .info() response value dict

hope this helped you learn a few ez python tricks :)

edited May 11, 2022 at 4:40

answered May 7, 2022 at 17:47

Freak

172 bronze badges

1 Comment

Darrow Hartman Over a year ago

As it’s currently written, your answer is unclear. Please edit to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers in the help center.

Collectives™ on Stack Overflow

Python urllib2 Response header

6 Answers 6

3 Comments

Comments

3 Comments

Comments

1 Comment

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

6 Answers 6

3 Comments

Comments

3 Comments

Comments

1 Comment

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related