1

This works for me:


import xml.etree.ElementTree as ET
from urllib2 import urlopen

url = 'http://example.com'
# this url points to a `xml` page
tree = ET.parse(urlopen(url))

However, when I switch to requests, something was wrong:


import requests
import xml.etree.ElementTree as ET
url = 'http://example.com'
# this url points to a `xml` page
tree = ET.parse(requests.get(url))

The trackback error is showed below:


---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
 in ()
----> 1 tree = ET.parse(requests.get(url, proxies={'http': '192.168.235.36:7788'}))

/usr/lib/python2.7/xml/etree/ElementTree.py in parse(source, parser)
   1180 def parse(source, parser=None):
   1181     tree = ElementTree()
-> 1182     tree.parse(source, parser)
   1183     return tree
   1184 

/usr/lib/python2.7/xml/etree/ElementTree.py in parse(self, source, parser)
    645         close_source = False
    646         if not hasattr(source, "read"):
--> 647             source = open(source, "rb")
    648             close_source = True
    649         try:

TypeError: coercing to Unicode: need string or buffer, Response found


So, my question is: wha is wrong with requests in my situation and how can I make it work ET with requests?

2 Answers 2

3

You are passing the requests respones object to ElementTree; you want to pass in the raw file object instead:

r = requests.get(url, stream=True)
ET.parse(r.raw)

.raw returns the 'file-like' socket object, from which ElementTree.parse() will read, just like it'll read from the urllib2 response (which is itself a file-like object).

Concrete example:

>>> r = requests.get('http://www.enetpulse.com/wp-content/uploads/sample_xml_feed_enetpulse_soccer.xml', stream=True)
>>> tree = ET.parse(r.raw)
>>> tree
<xml.etree.ElementTree.ElementTree object at 0x109dadc50>
>>> tree.getroot().tag
'spocosy'

If you have a compressed URL, the raw socket (like urllib2) returns the compressed data undecoded; in that case you can use the ET.fromstring() method on the binary response content:

r = requests.get(url)
ET.fromstring(r.content)
Sign up to request clarification or add additional context in comments.

5 Comments

I have tries this before, but not worked. I got this back: ParseError: no element found: line 1, column 0
My apologies, the current API requires that you use stream=True for the raw reads to work properly, otherwise the data is downloaded early. Try again with the updated answer.
This doesn't work as is, as requests has already read data coming from the socket at the end of the first line. Passing stream=True as argument to the request is mandatory
@doukremt: that is what I am saying in my comments. :-)
@MartijnPieters It works! You have always been very helpful to me. Thank you so much:)
0

You're not feeding ElementTree the response text, but the requests Response object itself, which is why you get the type error: need string or buffer, Response found. Do this instead:

r = requests.get(url)
tree = ET.fromstring(r.text)

1 Comment

This won't work for two reasons: r.text is the Unicode (decoded) result, you should always parse XML as un-decoded data, and ET.parse() wants a filename or file-like object. ET.parse() will see the r.text result as a filename and pass it to open().

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.