3

I'm trying to figure out in lxml and python how to replace an element with a string.

In my experimentation, I have the following code:

from lxml import etree as et

docstring = '<p>The value is permitted only when that includes <xref linkend=\"my linkend\" browsertext=\"something here\" filename=\"A_link.fm\"/>, otherwise the value is reserved.</p>'

topicroot = et.XML(docstring)
topicroot2 = et.ElementTree(topicroot) 
xref = topicroot2.xpath('//*/xref')
xref_attribute = xref[0].attrib['browsertext']

print href_attribute

The result is: 'something here'

This is the browser text attribute I'm looking for in this small sample. But what I can't seem to figure out is how to replace the entire element with the attribute text I've captured here.

(I do recognize that in my sample I could have multiple xrefs and will need to construct a loop to go through them properly.)

What's the best way to go about doing this?

And for those wondering, I'm having to do this because the link actually goes to a file that doesn't exist because of our different build systems.

Thanks in advance!

1
  • What is the expected output? Commented Jul 12, 2018 at 13:14

1 Answer 1

3

Try this (Python 3):

from lxml import etree as et

docstring = '<p>The value is permitted only when that includes <xref linkend=\"my linkend\" browsertext=\"something here\" filename=\"A_link.fm\"/>, otherwise the value is reserved.</p>'

# Get the root element.
topicroot = et.XML(docstring)
topicroot2 = et.ElementTree(topicroot)

# Get the text of the root element. This is a list of strings!
topicroot2_text = topicroot2.xpath("text()")

# Get the xref elment.
xref = topicroot2.xpath('//*/xref')[0]
xref_attribute = xref.attrib['browsertext']

# Save a reference to the p element, remove the xref from it.
parent = xref.getparent()
parent.remove(xref)

# Set the text of the p element by combining the list of string with the
# extracted attribute value.
new_text = [topicroot2_text[0], xref_attribute, topicroot2_text[1]]
parent.text = "".join(new_text)

print(et.tostring(topicroot2))

Output:

b'<p>The value is permitted only when that includes something here, otherwise the value is reserved.</p>'
Sign up to request clarification or add additional context in comments.

1 Comment

This is exactly what I was trying to do. Thanks! I was kind of on the right track, but couldn't sort out how to remove the element. Awesome!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.