I have an xml document from which I want to extract text based on tags.
The part that I want to extract text from looks something like this :
<BlockText attr1="blah" attr2=657 ID="Bhf76" lang="en">
Simply dummy text of the printing and typesetting industry. It has survived not only<TIP CONTENT=""/>\n five centuries, electronic typesetting, remaining essentially release.
</BlockText>
When I do
tree = ET.parse("myfile.xml")
root = tree.getroot()
tags = list(set([elem.tag for elem in root.iter()]))
tag = list(filter(lambda i: "BlockText" in i, tags))[0]
for text in root.iter(tag):
texte = text.text
I'm only able to grab the part that comes before the empty tag <TIP CONTENT=""/>
I tried to delete this tag before getting the rest of the text.
I did :
emptyTag = list(filter(lambda i: "TIP" in i, tags))
for e in root.iter(emptyTag) :
root.remove(e)
But this is not working.
None of <BlockText> and <TIP> are direct children of root.
Thank you.
itertext(): stackoverflow.com/q/19369901/407651