I receive XML chunks from a server. Those chunks are not complete segments but could look for instance like this:
chunk1 = '<el a="1" b='
chunk2 = '"2"><sub c="'
chunk3 = '3">test</sub'
chunk4 = '></el><el d='
chunk5 = '"4" e="5"></'
chunk6 = 'el>'
How can I parse this stream, so that whenever one "el" element is complete a function is called?
So far I'm taking this approach (using ElementTree):
import xml.etree.ElementTree as ET
text = ""
def handle_message(msg):
text += msg
try:
root = ET.fromstring("<root>" + text + "</root>")
for el in list(root):
handle_element(el)
text = ""
return True
except ET.ParseError:
return False
However, this approach doesn't really work, since it only calls handle_element when text contains by accident a well-formed XML document, but it cannot be guaranteed that this will ever be the case.
xml.sax. attach it to a simple file-type object that buffers data from the other end, and i think you'll have what you want.etreeand other DOM-type parsers expect to load the whole file at once and work with it atomically. or try BeautifulSoup, haven't tried it but think it's supposed to handle these cases.