3

What is a good way to replace an HTML tag like:

Old : <div id=pgbrk" ....../>....Page Break....</div>

New : <!--page break -->

div id might have many other values hence regex is not a good idea. I need some LXML kind of thing. Basically, my problem is to replace an HTML tag with a string!

2 Answers 2

3

As long as your div has a parent tag, you could do this:

import lxml.html as LH
import lxml.etree as ET

content='<root><div id="pgbrk" ......>....Page Break....</div></root>'
doc=LH.fromstring(content)
# print(LH.tostring(doc))    
for div in doc.xpath('//div[@id="pgbrk"]'):
    parent=div.getparent()
    parent.replace(div,ET.Comment("page break"))
print(LH.tostring(doc))

yields

<root><!--page break--></root>
Sign up to request clarification or add additional context in comments.

Comments

2

You can use plain DOM http://docs.python.org/library/xml.dom.minidom.html

1) parse your source

from xml.dom.minidom import parse
datasource = open('c:\\temp\\mydata.xml')
doc= parse(datasource)

2) find your nodes to remove

for node in doc.getElementsByTagName('div'):
    for attr in node.attributes:
        if attr.name == 'id':
            ...

3) when found targeted nodes, replace them with new comment node

parent = node.parentNode
parent.replaceChild(doc.createComment("page break"), node)

docs: http://docs.python.org/library/xml.dom.html

3 Comments

XML seemed to be very strict on HTML content ?
yupp. I 'd use Beautiful Soup. I thought it'd be helpful to show OP the DOM as well. maybe it was just noise. stackoverflow.com/questions/2073541/…
Yes this would have been fine except for the fact that the parseString ( not parse ) was not tolerant to my partially broken html . Thanks for the solution as it would be helpful in the case of an XML though .

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.