Parsing an xml file with an emphasis tag in it in python

Question

I am currently writing a python script that can extract all of the text in an xml file. I am using the Element Tree library to interpret the data but I am running into this problem however when the data is structured like this...

<Segment StartTime="639.752" EndTime="642.270" Participant="fe016">
  But I bet it's a good <Pause/> superset of it.
</Segment>

When I attempt to read out the text, I get the first half of the Segment ("Alright. So what we had") before the pause tag.

What I am trying to figure out is if there is a way to ignore the tags in the data segments and print out all of the text.

There is some inconsistency in your question: the <Segment... text quoted does not match the text in your question. — zx485
– zx485, Commented Apr 10, 2020 at 18:56

dabingsou · Accepted Answer · 2020-04-23 03:37:50Z

0

Another solution.

from simplified_scrapy import SimplifiedDoc,req,utils
html = '''<Segment StartTime="639.752" EndTime="642.270" Participant="fe016">
  But I bet it's a good <Pause/> superset of it.
</Segment>'''
doc = SimplifiedDoc(html)
print(doc.Segment)
print(doc.Segment.text)

Result:

{'StartTime': '639.752', 'EndTime': '642.270', 'Participant': 'fe016', 'tag': 'Segment', 'html': "\n  But I bet it's a good <Pause /> superset of it.\n"}
But I bet it's a good superset of it.

Here are more examples. https://github.com/yiyedata/simplified-scrapy-demo/blob/master/doc_examples

answered Apr 23, 2020 at 3:37

dabingsou

2,4691 gold badge8 silver badges8 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Alexandra Dudkina · Accepted Answer · 2020-10-14 09:39:05Z

0

xml = '''<Segment StartTime="639.752" EndTime="642.270" Participant="fe016">
  But I bet it's a good <Pause/> superset of it.
</Segment>'''

# solution using ETree
from xml.etree import ElementTree as ET

root = ET.fromstring(xml)
pause = root.find('./Pause')
print(root.text + pause.tail)

answered Oct 14, 2020 at 9:39

Alexandra Dudkina

4,5123 gold badges18 silver badges29 bronze badges

Collectives™ on Stack Overflow

Parsing an xml file with an emphasis tag in it in python

2 Answers 2

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related