1

I am currently writing a python script that can extract all of the text in an xml file. I am using the Element Tree library to interpret the data but I am running into this problem however when the data is structured like this...

<Segment StartTime="639.752" EndTime="642.270" Participant="fe016">
  But I bet it's a good <Pause/> superset of it.
</Segment>

When I attempt to read out the text, I get the first half of the Segment ("Alright. So what we had") before the pause tag.

What I am trying to figure out is if there is a way to ignore the tags in the data segments and print out all of the text.

2
  • 1
    There is some inconsistency in your question: the <Segment... text quoted does not match the text in your question. Commented Apr 10, 2020 at 18:56
  • Use itertext(): stackoverflow.com/q/19369901/407651 Commented Apr 10, 2020 at 19:36

2 Answers 2

0

Another solution.

from simplified_scrapy import SimplifiedDoc,req,utils
html = '''<Segment StartTime="639.752" EndTime="642.270" Participant="fe016">
  But I bet it's a good <Pause/> superset of it.
</Segment>'''
doc = SimplifiedDoc(html)
print(doc.Segment)
print(doc.Segment.text)

Result:

{'StartTime': '639.752', 'EndTime': '642.270', 'Participant': 'fe016', 'tag': 'Segment', 'html': "\n  But I bet it's a good <Pause /> superset of it.\n"}
But I bet it's a good superset of it.

Here are more examples. https://github.com/yiyedata/simplified-scrapy-demo/blob/master/doc_examples

Sign up to request clarification or add additional context in comments.

Comments

0
xml = '''<Segment StartTime="639.752" EndTime="642.270" Participant="fe016">
  But I bet it's a good <Pause/> superset of it.
</Segment>'''

# solution using ETree
from xml.etree import ElementTree as ET

root = ET.fromstring(xml)
pause = root.find('./Pause')
print(root.text + pause.tail)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.