3

I am trying to sort xml files based on 4 dimensions - nodenames, attribute names, attribute values, and lastly based on node values.

My XML

<NodeRoot>
    <NodeA class="3">
        <NodeB>
            <NodeC abc="1">103</NodeC>
            <NodeD>103</NodeD>
            <NodeC pqr="2">101</NodeC>
            <NodeC pqr="1">102</NodeC>
            <NodeD>101</NodeD>
        </NodeB>
    </NodeA>
    <NodeA class="1">
        <NodeGroup>
            <NodeC name="z" asc="2">103</NodeC>
            <NodeC name="b">101</NodeC>
            <NodeC name="a">102</NodeC>
        </NodeGroup>
    </NodeA>
</NodeRoot>

My XSL

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output encoding="utf-8" method="xml" omit-xml-declaration="yes" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="@*|node()">
    <xsl:copy>
        <xsl:apply-templates select="@*">
            <xsl:sort select="local-name()"/>
            <xsl:sort select="."/>
        </xsl:apply-templates>
        <xsl:apply-templates select="node()">
            <xsl:sort select="local-name()"/>
            <xsl:sort select="."/>
        </xsl:apply-templates>
    </xsl:copy>
</xsl:template>
</xsl:stylesheet>

Current Output

<NodeRoot>
   <NodeA class="1">
      <NodeGroup>
         <NodeC name="b">101</NodeC>
         <NodeC name="a">102</NodeC>
         <NodeC asc="2" name="z">103</NodeC>
      </NodeGroup>
   </NodeA>
   <NodeA class="3">
      <NodeB>
         <NodeC pqr="2">101</NodeC>
         <NodeC pqr="1">102</NodeC>
         <NodeC abc="1">103</NodeC>
         <NodeD>101</NodeD>
         <NodeD>103</NodeD>
      </NodeB>
   </NodeA>
</NodeRoot>

Expected Outcome

<NodeRoot>
   <NodeA class="1">
      <NodeGroup>
         <NodeC asc="2" name="z">103</NodeC>
         <NodeC name="a">102</NodeC>
         <NodeC name="b">101</NodeC>
      </NodeGroup>
   </NodeA>
   <NodeA class="3">
      <NodeB>
         <NodeC abc="1">103</NodeC>
         <NodeC pqr="1">102</NodeC>
         <NodeC pqr="2">101</NodeC>
         <NodeD>101</NodeD>
         <NodeD>103</NodeD>
      </NodeB>
   </NodeA>
</NodeRoot>

Test XSLT --> http://xsltransform.net/naZXpY7

4
  • I can see how your expected outcome is sorted by attribute values as the NodeGroup/NodeC elements are sorted by the a, b, z values of the name attributes. Also, the NodeA elements are sorted by the 1, 3 values of the class attributes. I can also see how it is then sorted by the node values, as the NodeB/NodeC elements (without attributes) are sorted by the 101, 103 values. However, I don't see how the output is sorted by node name or attribute name. Commented Nov 13, 2017 at 13:16
  • 1
    What would you expect to happen if an element had more than one attribute? Commented Nov 13, 2017 at 13:21
  • @BenL - Updated the question with better sample xml Commented Nov 13, 2017 at 13:35
  • @TimC - Updated the question with better sample xml Commented Nov 13, 2017 at 13:35

2 Answers 2

1

You're currently sorting all attributes of an element by local name and value, and then all children (again by local name and string value).

So far, so good.

One difficulty you face is what exactly you mean by sorting by "attribute names". From your example, it looks as if you want elements sorted by a list of their attribute names in alphabetic order, so that the sort keys for the children of your NodeGroup element are

'NodeC', 'asc name', '2 z', 103
'NodeC', 'name', 'a', 102
'NodeC', 'name', 'b', 201

The next difficulty is that there's no obvious way to obtain the value 'asc name' from an XPath 1.0 expression with the first NodeC of your NodeGroup element as context node. It's possible to generate the string, of course, but it requires a call to a named template. (Or, to be more precise: I don't see how to generate it without such a call.)

XSLT 2.0 solution

The problem is relatively straightforward in XSLT 2.0; the following fragments show the crucial bits:

<xsl:template match="@*|node()">
  <xsl:copy>
    <xsl:apply-templates select="@*">
      <xsl:sort select="local-name()"/>
      <xsl:sort select="."/>
    </xsl:apply-templates>
    <xsl:apply-templates select="node()">
      <xsl:sort select="local-name()"/>
      <xsl:sort select="string-join(local:key2(.), ' ')"/>
      <xsl:sort select="string-join(local:key3(.), ' ')"/>
      <xsl:sort select="." data-type="number"/>
    </xsl:apply-templates>
  </xsl:copy>
</xsl:template>

<xsl:function name="local:key2"  as="xs:string*">
  <xsl:param name="e" as="node()"/>
  <xsl:for-each select="$e/@*">
    <xsl:sort select="local-name()"/>
    <xsl:sort select="string()"/>
    <xsl:value-of select="local-name()"/>
  </xsl:for-each>
</xsl:function>

<xsl:function name="local:key3"  as="xs:string*">
  <xsl:param name="e" as="node()"/>
  <xsl:for-each select="$e/@*">
    <xsl:sort select="local-name()"/>
    <xsl:sort select="string()"/>
    <xsl:value-of select="string()"/>
  </xsl:for-each>
</xsl:function>

This general approach can also be used in XSLT 1.0 with the EXSLT extension for user-defined functions.

Solution in XSLT 1.0 with EXSLT functions

If your XSLT 1.0 processor supports EXSLT-style user-defined functions, you may be able to do something similar in XSLT 1.0. (My initial attempts failed, but the errors disappeared when I remembered to add the extension-element-prefixes attribute to the stylesheet element.)

<xsl:stylesheet version="1.0"
                extension-element-prefixes="func"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                xmlns:xs="http://www.w3.org/2001/XMLSchema"
                xmlns:func="http://exslt.org/functions"
                xmlns:local="http://example.com/nss/dummy">

  <xsl:output encoding="utf-8" 
              method="xml" 
              omit-xml-declaration="yes" 
              indent="yes"/>
  <xsl:strip-space elements="*"/>
  <xsl:template match="@*|node()">
    <xsl:copy>
      <xsl:apply-templates select="@*">
        <xsl:sort select="local-name()"/>
        <xsl:sort select="."/>
      </xsl:apply-templates>
      <xsl:apply-templates select="node()">
        <xsl:sort select="local-name()"/>
        <xsl:sort select="local:key2(.)"/>
        <xsl:sort select="local:key3(.)"/>
        <xsl:sort select="." data-type="number"/>
      </xsl:apply-templates>
    </xsl:copy>
  </xsl:template>

  <func:function name="local:key2">
    <xsl:param name="e" select="."/>

    <func:result>
      <xsl:for-each select="$e/@*">
        <xsl:sort select="local-name()"/>
        <xsl:sort select="string()"/>
        <xsl:value-of select="concat(local-name(), ' ')"/>
      </xsl:for-each>
    </func:result>
  </func:function>

  <func:function name="local:key3">
    <xsl:param name="e" select="."/>
    <func:result>
      <xsl:for-each select="$e/@*">
        <xsl:sort select="local-name()"/>
        <xsl:sort select="string()"/>
        <xsl:value-of select="concat(string(), ' ')"/>
      </xsl:for-each>
    </func:result>
  </func:function>
</xsl:stylesheet>

When run on your input with xsltproc, this produces the desired output.

You might also be able to do something clever in XSLT 1.0 with the node-set extension.

Two-stage pipeline in unextended XSLT 1.0

But the simplest way I can see to solve this problem in unextended XSLT 1.0 is to pipeline two stylesheets together. The first one adds two attributes to every element, to provide sort keys 2 and 3. (Adjust the named templates to make them do what you want.)

<xsl:stylesheet version="1.0" 
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                xmlns:PJ="http://example.com/PankajJaju">
  <xsl:output encoding="utf-8" 
              method="xml" 
              omit-xml-declaration="yes" 
              indent="yes"/>
  <xsl:strip-space elements="*"/>
  <xsl:template match="@*|node()">
    <xsl:copy>
      <xsl:apply-templates select="@*"/>
      <xsl:if test="self::*">
        <xsl:attribute name="PJ:attribute-names" 
                       namespace="http://example.com/PankajJaju">
          <xsl:call-template name="attribute-name-list"/>
        </xsl:attribute>
        <xsl:attribute name="PJ:attribute-values" 
                       namespace="http://example.com/PankajJaju">
          <xsl:call-template name="attribute-value-list"/>
        </xsl:attribute>
      </xsl:if>
      <xsl:apply-templates select="node()"/>
    </xsl:copy>
  </xsl:template>

  <xsl:template name="attribute-name-list">
    <xsl:for-each select="@*">
      <xsl:sort select="local-name()"/>
      <xsl:sort select="string()"/>
      <xsl:value-of select="concat(local-name(), ' ')"/>
    </xsl:for-each>
  </xsl:template>
  <xsl:template name="attribute-value-list">
    <xsl:for-each select="@*">
      <xsl:sort select="local-name()"/>
      <xsl:sort select="string()"/>
      <xsl:value-of select="concat(string(), ' ')"/>
    </xsl:for-each>
  </xsl:template>
</xsl:stylesheet>

I've put them into a namespace to reduce the likelihood of name collisions.

The second one uses the sort keys to perform the actual sort and suppresses the temporary attributes.

<xsl:stylesheet version="1.0" 
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                xmlns:PJ="http://example.com/PankajJaju">
  <xsl:output encoding="utf-8" 
              method="xml" 
              omit-xml-declaration="yes" 
              indent="yes"/>
  <xsl:strip-space elements="*"/>
  <xsl:template match="@*|node()">
    <xsl:copy>
      <xsl:apply-templates select="@*">
        <xsl:sort select="local-name()"/>
        <xsl:sort select="."/>
      </xsl:apply-templates>
      <xsl:apply-templates select="node()">
        <xsl:sort select="local-name()"/>
        <xsl:sort select="@PJ:attribute-names"/>
        <xsl:sort select="@PJ:attribute-values"/>
        <xsl:sort select="."/>
      </xsl:apply-templates>
    </xsl:copy>
  </xsl:template>

  <xsl:template match="@PJ:attribute-names | @PJ:attribute-values"/>
</xsl:stylesheet>

These can be pipelined together using whatever technology you prefer. Using xsltproc from the bash command line, for example, and assigning the names p1.xsl and p2.xsl to pipeline stylesheets 1 and 2 ...

xsltproc p1.xsl input.xml | xsltproc p2.xsl -

This produces the output you say you want.

Sign up to request clarification or add additional context in comments.

4 Comments

XSLT 2.0 is throwing errors but XSLT 1.0 worked. Your 2 part solution is ingenious. One small issue, I added exclude-result-prefixes="PJ" to suppress the namespace in root node but it doesn't work.
The exclude-result-prefixes applies only to literal result elements; it does not affect the namespaces included in the output produced by an xsl:copy instruction.
That's a small issue which doesn't bother me at all. Thanks for the solution.
You might also be able to do something clever in XSLT 1.0 with the node-set extension. - I have create one as per your suggestion and it returns the expected results. Not sure how efficient it is but it works for the most part.
1

Based on @C. M. Sperberg-McQueen's suggestion of node-set extension and with the help of the example found at https://www.xml.com/pub/a/2003/07/16/nodeset.html, I came up with a single xsl which merges the McQueen's 2 xsls.

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:exslt="http://exslt.org/common" exclude-result-prefixes="exslt" xmlns:PJ="http://example.com/PankajJaju">
    <xsl:output encoding="utf-8" method="xml" omit-xml-declaration="yes" indent="yes"/>
    <xsl:strip-space elements="*"/>

    <xsl:template match="node()|@*" name="first-pass">
        <xsl:copy>
            <xsl:apply-templates select="@*"/>
            <xsl:if test="self::*">
                <xsl:attribute name="PJ:attribute-names" namespace="http://example.com/PankajJaju">
                    <xsl:call-template name="attribute-name-list"/>
                </xsl:attribute>
                <xsl:attribute name="PJ:attribute-values" namespace="http://example.com/PankajJaju">
                    <xsl:call-template name="attribute-value-list"/>
                </xsl:attribute>
            </xsl:if>
            <xsl:apply-templates select="node()"/>
        </xsl:copy>
    </xsl:template>
    <xsl:template name="attribute-name-list">
        <xsl:for-each select="@*">
            <xsl:sort select="local-name()"/>
            <xsl:sort select="string()"/>
            <xsl:value-of select="concat(local-name(), ' ')"/>
        </xsl:for-each>
    </xsl:template>
    <xsl:template name="attribute-value-list">
        <xsl:for-each select="@*">
            <xsl:sort select="local-name()"/>
            <xsl:sort select="string()"/>
            <xsl:value-of select="concat(string(), ' ')"/>
        </xsl:for-each>
    </xsl:template>

    <xsl:template match="/">
        <xsl:variable name="process-one">
            <xsl:call-template name="first-pass"/>
        </xsl:variable>
        <xsl:apply-templates select="exslt:node-set($process-one)" mode="second-pass"/>
    </xsl:template>

    <xsl:template match="@*|node()" mode="second-pass">
        <xsl:copy>
            <xsl:apply-templates select="node()|@*" mode="second-pass"/>
        </xsl:copy>
    </xsl:template>
    <xsl:template match="*" mode="second-pass">
        <xsl:copy>
            <xsl:apply-templates select="@*" mode="second-pass">
                <xsl:sort select="local-name()"/>
                <xsl:sort select="."/>
            </xsl:apply-templates>
            <xsl:apply-templates select="node()" mode="second-pass">
                <xsl:sort select="local-name()"/>
                <xsl:sort select="@PJ:attribute-names"/>
                <xsl:sort select="@PJ:attribute-values"/>
                <xsl:sort select="."/>
            </xsl:apply-templates>
        </xsl:copy>
    </xsl:template>        
    <xsl:template match="@PJ:attribute-names | @PJ:attribute-values" mode="second-pass"/>
</xsl:stylesheet>

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.