14

Here is a part of XML:

<item><img src="cat.jpg" /> Picture of a cat</item>

Extracting the tag is easy. Just do:

et = xml.etree.ElementTree.fromstring(our_xml_string)
img = et.find('img')

But how to get the text immediately after it (Picture of a cat)? Doing the following returns a blank string:

print et.text

1 Answer 1

30

Elements have a tail attribute -- so instead of element.text, you're asking for element.tail.

>>> import lxml.etree
>>> root = lxml.etree.fromstring('''<root><foo>bar</foo>baz</root>''')
>>> root[0]
<Element foo at 0x145a3c0>
>>> root[0].tail
'baz'

Or, for your example:

>>> et = lxml.etree.fromstring('''<item><img src="cat.jpg" /> Picture of a cat</item>''')
>>> et.find('img').tail
' Picture of a cat'

This also works with plain ElementTree:

>>> import xml.etree.ElementTree
>>> xml.etree.ElementTree.fromstring(
...   '''<item><img src="cat.jpg" /> Picture of a cat</item>'''
... ).find('img').tail
' Picture of a cat'
Sign up to request clarification or add additional context in comments.

3 Comments

Brilliant. I tried using .tail before, but I was using it on my el object. Did not realise I had to use it on img. Thank you for enlightening me!
@Neuron, ElementTree is a library name, not a code snippet. Its original formatting was deliberate. See When should code formatting be used for non-code text? and Should code formatting be used for package names? on Meta Stack Overflow.
Thanks for the meta posts. though I am aware, that package names should not be code highlighted. I thought you were referencing the type ElementTree, not the package

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.