2

I have an xml file and I want to uncomment and comment out an element in the file.

<my_element>
    <blablabla href="docs/MyBlank.htm" />
</my_element>

This one I would like to "close" (comment out) like this:

<!--
<my_element>
    <blablabla href="docs/MyBlank.htm" />
</my_element>
-->

Furter down in the file I have an element with the same name which is "closed" (commented out) like this:

<!--
<my_element>
    <blablabla href="secretwebhacking/MySecrectBankLogin.htm" />
</my_element>
-->

and I want to "open" it up (uncomment) like:

<my_element>
     <blablabla href="secretwebhacking/MySecrectBankLogin.htm" />
</my_element>

I use ElementTree for this, I know how to edit the value and the attribute in the element, but I am not at all sure how to remove and add the <!-- --> around one specific element.

1 Answer 1

3

You could use BeautifulSoup to do the parsing. Basic example:

xmlbody = '<stuff>\
<my_element>\
    <blablabla href="docs/MyBlank.htm" />\
</my_element>\
<!--\
<my_element>\
    <blablabla href="secretwebhacking/MySecrectBankLogin.htm" />\
</my_element>\
-->\
</stuff>'

from bs4 import BeautifulSoup, Comment
soup = BeautifulSoup(xmlbody, "lxml")

# Find all comments
comments = soup.findAll(text=lambda text:isinstance(text, Comment))
for comment in comments:
  # Create new soup object from comment contents
  commentsoup = BeautifulSoup(comment, "lxml")
  # Find the tag we want
  blatag = commentsoup.find('blablabla')
  # Check if it is the one we need
  if(blatag['href']=="secretwebhacking/MySecrectBankLogin.htm"):
    # If so, insert the element within the comment into the document
    comment.insert_after(commentsoup.find('body').find('my_element'))
    # And remove the comment
    comment.extract()

# Find all my_elements
my_elements = soup.findAll('my_element')
for tag in my_elements:
  # Check if it's the one we want
  if(tag.find('blablabla')['href'] == "docs/MyBlank.htm"):
    # If so, insert a commented version
    tagcomment = soup.new_string(str(tag), Comment)
    tag.insert_after(tagcomment)
    # And remove the tag
    tag.extract()

print(soup.find('html').find('body').prettify().replace("<body>\n","").replace("\n</body>",""))

That should start you off, you can make it as complex as you need. The output is this:

  <stuff>
   <!--<my_element> <blablabla href="docs/MyBlank.htm"></blablabla></my_element>-->
   <my_element>
    <blablabla href="secretwebhacking/MySecrectBankLogin.htm">
    </blablabla>
   </my_element>
  </stuff>
Sign up to request clarification or add additional context in comments.

2 Comments

I tried it, however I found another way to solve my problem. I just updated the element that with the address I wanted. Instead of removing and adding <!-- -->. Thanks Marco for your help.
No problem! If it answered your question, though, even if you don't end up using it, do kindly accept the answer. :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.