13

I know this question has been asked before but I am struggling to get it to work with my example and would really appreciate some help. What I am trying to achieve seems fairly straight forward: I have 2 files, 1 similar to the one below and the second pretty much the same except that it is only has the LAYER and then the TEST NAME - ie. no MASTER.

<MASTER>
<LAYER NAME="LAYER B">
    <TEST NAME="Soup1">
        <TITLE>Title 2</TITLE>
        <SCRIPTFILE>PAth 2</SCRIPTFILE>
        <ASSET_FILE PATH="Path 22" />
        <ARGS>
          <ARG ID="arg_21">some_Arg11</ARG>
          <ARG ID="arg_22">some_Arg12</ARG>
        </ARGS>
        <TIMEOUT OSTYPE="111">1200</TIMEOUT>
    </TEST>

    <TEST NAME="Bread2">
        <TITLE>Title 1</TITLE>
        <SCRIPTFILE>PAth 1</SCRIPTFILE>
        <ASSET_FILE PATH="Path 11" />        
        <ARGS>
          <ARG ID="arg_11">some_Arg12</ARG>
          <ARG ID="arg_12">some_Arg22</ARG>
        </ARGS>
        <TIMEOUT OSTYPE="2222">1000</TIMEOUT>
    </TEST>
</LAYER>
<LAYER NAME="LAYER A">
    <TEST NAME="Soup2">
        <TITLE>Title 2</TITLE>
        <SCRIPTFILE>PAth 2</SCRIPTFILE>
        <ASSET_FILE PATH="Path 22" />
        <ARGS>
          <ARG ID="arg_21">some_Arg11</ARG>
          <ARG ID="arg_22">some_Arg12</ARG>
        </ARGS>
        <TIMEOUT OSTYPE="111">1200</TIMEOUT>
    </TEST>

    <TEST NAME="Bread2">
        <TITLE>Title 1</TITLE>
        <SCRIPTFILE>PAth 1</SCRIPTFILE>
        <ASSET_FILE PATH="Path 11" />        
        <ARGS>
          <ARG ID="arg_11">some_Arg12</ARG>
          <ARG ID="arg_12">some_Arg22</ARG>
        </ARGS>
        <TIMEOUT OSTYPE="2222">1000</TIMEOUT>
    </TEST>
</LAYER>
</MASTER>

and all I am trying to do is to sort these files based on the NAME, respecting the individual LAYERS.

In the scenario above, LAYER A should come prior to LAYER B and within each layer, they should be ordered by NAME, hence Bread before Soup. For my second scenario I do not have these sublayers.

<LAYER>
    <TEST NAME="Soup1">
        <TITLE>Title 2</TITLE>
        <SCRIPTFILE>PAth 2</SCRIPTFILE>
        <ASSET_FILE PATH="Path 22" />
        <ARGS>
          <ARG ID="arg_21">some_Arg11</ARG>
          <ARG ID="arg_22">some_Arg12</ARG>
        </ARGS>
        <TIMEOUT OSTYPE="111">1200</TIMEOUT>
    </TEST>

    <TEST NAME="Bread2">
        <TITLE>Title 1</TITLE>
        <SCRIPTFILE>PAth 1</SCRIPTFILE>
        <ASSET_FILE PATH="Path 11" />        
        <ARGS>
          <ARG ID="arg_11">some_Arg12</ARG>
          <ARG ID="arg_12">some_Arg22</ARG>
        </ARGS>
        <TIMEOUT OSTYPE="2222">1000</TIMEOUT>
    </TEST>
</LAYER>

and I want them sorted by TEST NAME.

Thanks in advance guys your help will be appreciated.

1
  • 1
    The children of a ElementTree-Element are lists, so use list.sort Commented Aug 16, 2014 at 9:41

2 Answers 2

28

Using ElementTree you can do this:

import xml.etree.ElementTree as ET

def sortchildrenby(parent, attr):
    parent[:] = sorted(parent, key=lambda child: child.get(attr))

tree = ET.parse('input.xml')
root = tree.getroot()

sortchildrenby(root, 'NAME')
for child in root:
    sortchildrenby(child, 'NAME')

tree.write('output.xml')
Sign up to request clarification or add additional context in comments.

1 Comment

have error even with the xml in question TypeError: '<' not supported between instances of 'NoneType' and 'NoneType'
2

If your want to sort in a recursive way, handling comments and sort along all attributes:

#!/usr/bin/env python
# encoding: utf-8

from __future__ import print_function

import logging
from lxml import etree


def get_node_key(node, attr=None):
    """Return the sorting key of an xml node
    using tag and attributes
    """
    if attr is None:
        return '%s' % node.tag + ':'.join([node.get(attr)
                                        for attr in sorted(node.attrib)])
    if attr in node.attrib:
        return '%s:%s' % (node.tag, node.get(attr))
    return '%s' % node.tag


def sort_children(node, attr=None):
    """ Sort children along tag and given attribute.
    if attr is None, sort along all attributes"""
    if not isinstance(node.tag, str):  # PYTHON 2: use basestring instead
        # not a TAG, it is comment or DATA
        # no need to sort
        return
    # sort child along attr
    node[:] = sorted(node, key=lambda child: get_node_key(child, attr))
    # and recurse
    for child in node:
        sort_children(child, attr)


def sort(unsorted_file, sorted_file, attr=None):
    """Sort unsorted xml file and save to sorted_file"""
    tree = etree.parse(unsorted_file)
    root = tree.getroot()
    sort_children(root, attr)

    sorted_unicode = etree.tostring(root,
                                    pretty_print=True,
                                    encoding='unicode')
    with open(sorted_file, 'w') as output_fp:
        output_fp.write('%s' % sorted_unicode)
        logging.info('written sorted file %s', sorted_unicode)

Note: I am using lxml.etree (http://lxml.de/tutorial.html)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.