3

I have an xml

<root>
 <node1>
  <B>text</B>
  <A>another_text</A>
  <C>one_more_text</C>
 </node1>
 <node2>
  <C>one_more_text</C>
  <B>text</B>
  <A>another_text</A>
 </node2>
</root>

I want get output like:

<root>
 <node1>
  <A>another_text</A>
  <B>text</B>
  <C>one_more_text</C>
 </node1>
 <node2>
  <A>another_text</A>
  <B>text</B>
  <C>one_more_text</C>
 </node2>
</root>

I tried with some code like:

from xml.etree import ElementTree as et

tr = et.parse(path_in)
root = tr.getroot()
for children in root.getchildren():
    for child in children.getchildren():
        # sort it

tr.write(path_out)        

I cannot use standard function sort and sorted because it sorted wrong way (not by tag). Thanks in advance.

2 Answers 2

3

You need to:

  • get the children elements for every top-level "node"
  • sort them by the tag attribute (node's name)
  • reset the child nodes of each top-level node

Sample working code:

from operator import attrgetter
from xml.etree import ElementTree as et

data = """  <root>
 <node1>
  <B>text</B>
  <A>another_text</A>
  <C>one_more_text</C>
 </node1>
 <node2>
  <C>one_more_text</C>
  <B>text</B>
  <A>another_text</A>
 </node2>
</root>"""


root = et.fromstring(data)
for node in root.findall("*"):  # searching top-level nodes only: node1, node2 ...
    node[:] = sorted(node, key=attrgetter("tag"))

print(et.tostring(root))

Prints:

<root>
 <node1>
  <A>another_text</A>
  <B>text</B>
  <C>one_more_text</C>
 </node1>
 <node2>
  <A>another_text</A>
  <B>text</B>
  <C>one_more_text</C>
  </node2>
</root>

Note that we are not using getchildren() method here (it is actually deprecated since Python 2.7) - using the fact that each Element instance is an iterable over the child nodes.

Sign up to request clarification or add additional context in comments.

Comments

3

From a similar question :

from lxml import etree

data = """<X>
    <X03>3</X03>
    <X02>2</X02>
    <A>
        <A02>Y</A02>
        <A01>X</A01>
        <A03>Z</A03>
    </A>
    <X01>1</X01>
    <B>
        <B01>Z</B01>
        <B02>X</B02>
        <B03>C</B03>
    </B>
</X>"""

doc = etree.XML(data,etree.XMLParser(remove_blank_text=True))

for parent in doc.xpath('//*[./*]'): # Search for parent elements
  parent[:] = sorted(parent,key=lambda x: x.tag)

print etree.tostring(doc,pretty_print=True)

result :

<X>
  <A>
    <A01>X</A01>
    <A02>Y</A02>
    <A03>Z</A03>
  </A>
  <B>
    <B01>Z</B01>
    <B02>X</B02>
    <B03>C</B03>
  </B>
  <X01>1</X01>
  <X02>2</X02>
  <X03>3</X03>
</X>

You can find more information here : http://effbot.org/zone/element-sort.htm

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.