2

I have two xml files.

File 1 -

   <?xml version="1.0" encoding="UTF-8"?>
    <Root>
        <School>
            <section id="12" name="Apple"/>
            <section id="50" name="Newton"/>
        </School>
        <Students>
            <roll no="111" name="Smith"/>
            <roll no="122" name="Alan"/>
            <roll no="20" name="Bruce"/>
        </Students>
        <Teachers>
            <Math>
                <emp id="55" name="Karen"/>
                <emp id="2" name="David"/>
            </Math>
            <Science>
                <emp id="1" name="Thomas"/>
            </Science>
        </Teachers>
        <Sports>
            <Indoor>
                <Boardgame>
                    <game id="12" name="Chess"/>
                </Boardgame>
                <Arcade>
                    <game id="3" name="Car Racing"/>
                </Arcade>
            </Indoor>
            <Outdoor>
                <Field>
                    <game id="1" name="Football"/>
                    <game id="100" name="Cricket"/>
                </Field>
                <Court>
                    <game id="2" name="Tennis"/>
                </Court>
            </Outdoor>
        </Sports>
    </Root>

File 2 -

<?xml version="1.0" encoding="UTF-8"?>
<Updates>
    <School>
        <section id="12" name="Orange"/>
    </School>
    <Students>
        <roll no="122" name="Sam"/>
    </Students>
    <Teachers>
        <Math>
            <emp id="300" name="Steve" />
        </Math>
    </Teachers>
    <Sports>
        <Indoor>
            <Boardgame>
                <game id="37" name="Monopoly"/>
            </Boardgame>
            <Boardgame2>
                <game id="36" name="Ludo"/>
            </Boardgame2>
        </Indoor>
        <Outdoor>
            <Field>
                <game id="1" name="Football"/>
                <game id="100" name="Bull Fighting"/>
            </Field>
            <Court>
                <game id="19" name="Badminton"/>
            </Court>
        </Outdoor>
        <Computer>
            <game id="10" name="AOE" />
        </Computer>
    </Sports>
</Updates>

I need to merge the files so that I get the following output. Entries in file2 would overwrite those in file1 if id/no are matching.New elements would be added as required from file2 in the output under the proper hierarchy.

Output of Transformation -

<?xml version="1.0" encoding="UTF-8"?>
<Root>
    <School>
        <section id="12" name="Orange"/>
        <section id="50" name="Newton"/>
    </School>
    <Students>
        <roll no="111" name="Smith"/>
        <roll no="122" name="Sam"/>
        <roll no="20" name="Bruce"/>
    </Students>
    <Teachers>
        <Math>
            <emp id="55" name="Karen"/>
            <emp id="2" name="David"/>
            <emp id="300" name="Steve" />
        </Math>
        <Science>
            <emp id="1" name="Thomas"/>
        </Science>
    </Teachers>
    <Sports>
        <Indoor>
            <Boardgame>
                <game id="12" name="Chess"/>
                <game id="37" name="Monopoly"/>
            </Boardgame>
            <Arcade>
                <game id="3" name="Car Racing"/>
            </Arcade>
            <Boardgame2>
                <game id="36" name="Ludo"/>
            </Boardgame2>
        </Indoor>
        <Outdoor>
            <Field>
                <game id="1" name="Football"/>
                <game id="100" name="Bull Fighting"/>
            </Field>
            <Court>
                <game id="2" name="Tennis"/>
                <game id="19" name="Badminton"/>
            </Court>
        </Outdoor>
        <Computer>
            <game id="10" name="AOE" />
        </Computer>     
    </Sports>
</Root>

Below is the XSLT, but it works only for updates, not for inserts.

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

    <xsl:output method="xml" indent="no"/>

    <xsl:template match="/">
        <xsl:apply-templates select="node()">
            <xsl:with-param name="doc-context" select="document('file2.xml')/node()" />
        </xsl:apply-templates>
    </xsl:template>

    <xsl:template match="node()">
        <xsl:param name="doc-context" />

        <xsl:variable name="id" select="@id" />
        <xsl:variable name="no" select="@no" />

        <xsl:copy>
            <xsl:copy-of select="@*|$doc-context[@id = $id or @no = $no]/@*" />
            <xsl:apply-templates select="node()">
                <xsl:with-param name="doc-context" select="$doc-context/node()" />
            </xsl:apply-templates>
        </xsl:copy>
    </xsl:template>

</xsl:stylesheet>
8
  • 1
    have you tried anything so far? we can help if you make attempts at your problem and if you post the code of your attempts, thank you Commented Dec 30, 2015 at 2:40
  • Included the XSLT that i have worked on so far which works for updates based on the ids or nos. But it does not handle inserts or deletes. Commented Dec 30, 2015 at 5:53
  • Are you able to use XSLT 2.0? 3.0? Commented Dec 30, 2015 at 6:37
  • Also, is ordering important? Commented Dec 30, 2015 at 6:41
  • How does the change file signify deletes? Commented Dec 30, 2015 at 6:42

1 Answer 1

2

I will write a full style-sheet later, but for now, here is a method which I would employ to solve this problem ....

  1. Use XSLT 2.0 or 3.0
  2. Start with a basic identity transform
  3. Using empty templates, remove the elements whose @id matches any @id value in the updates file (we will get to how to test this later).
  4. Template for "parent of id-able elements", that is to say School, Math etc. How you do this depends on whether or not this list of element names is fixed or dynamic.
  5. In the aforementioned templates, start with the normal processing (xsl:copy and xsl:apply-templates on the children), but also add (under the xsl:copy), elements from the updates file which match the path of the focus node.

You can use xsl:key and the key() function for the tests in steps 2 and 5. But beware of a common trap for newbies: The 2-arity key() function has an implicit parameter of the focus document.


Update

How about ...

<xsl:transform
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:so="http://stackoverflow.com/questions/34522017"
    xmlns:xs="http://www.w3.org/2001/XMLSchema"
    version="2.0"
    exclude-result-prefixes="so xs">

<xsl:output omit-xml-declaration="yes" encoding="UTF-8" indent="yes" />
<xsl:param name="updates-file" as="xs:string" />
<xsl:strip-space elements="*" />

<xsl:variable name="updates" select="doc($updates-file)" />

<xsl:function name="so:merge-key" as="xs:string">
  <xsl:param name="ele" as="element()" />
  <!-- Updates and Root are at the same for merging purposes. -->
  <xsl:variable name="ele-name" select="local-name($ele[not(self::Updates)][not(self::Root)])" />
  <xsl:value-of select="concat( $ele-name, '!', $ele/@id, $ele/@no)" /> 
</xsl:function>

<xsl:template match="@*|comment()|processing-instruction()|text()">
  <xsl:copy />
</xsl:template>

<xsl:template match="*">
  <xsl:copy>
    <xsl:apply-templates select="@*|node()" />
  </xsl:copy>
</xsl:template>

<xsl:template match="/Root">
  <xsl:apply-templates select="." mode="update">
      <xsl:with-param name="peer-updates" select="$updates/Updates" />
  </xsl:apply-templates>
</xsl:template>

<xsl:template match="*[not(@id|@no)]" mode="update">
  <xsl:param name="peer-updates" as="element()*" />
  <xsl:variable name="this-key" select="so:merge-key(.)" />
  <xsl:variable name="compare-set" select="*" as="element()*" />
  <xsl:variable name="merge-other" select="$peer-updates[so:merge-key(.) eq $this-key]/*" as="element()*" />
  <xsl:copy>
    <!-- Process the fluff. -->
    <xsl:apply-templates select="@*|comment()|processing-instruction()|text()" />

    <!-- Now the unchanged orginal elements. -->
    <xsl:apply-templates select="*[not( so:merge-key(.) = $merge-other/so:merge-key(.))]" />

    <!-- Now the updated elements. -->
    <xsl:apply-templates select="*[so:merge-key(.) = $merge-other/so:merge-key(.)]" mode="update">
      <xsl:with-param name="peer-updates" select="$merge-other[so:merge-key(.) = $compare-set/so:merge-key(.)]" />
    </xsl:apply-templates>

    <!-- Now new elements. -->
    <xsl:apply-templates select="$merge-other[ not( so:merge-key(.) = $compare-set/so:merge-key(.))]" />
  </xsl:copy>
</xsl:template>

<xsl:template match="*[@id|@no]" mode="update">
  <xsl:param name="peer-updates" as="element()*" />
  <xsl:variable name="this-key" select="so:merge-key(.)" />
  <xsl:variable name="merge-other" select="$peer-updates[so:merge-key(.) eq $this-key]" as="element()?" />
  <xsl:copy-of select="if ($merge-other) then $merge-other else ." />
</xsl:template>

</xsl:transform>

Notes

  1. The stylesheet parameter updates-file specifies the URI of the updates document. Pass in this actual parameter value.
  2. I went in a different direction to the aforementioned method. This is because at first I thought @id would be a unique document-wide key, but from your sample documents, this appears not to be the case. So instead I used a merge paradigm.

Update 2

The OP has asked for a change in the ordering rules. Here is a quick and dirty change enforce the specified ordering rules. Replace the two sequence constructors headed with comments Now the unchanged original elements. and Now the updated elements., with this one ...

<!-- For the original elements, both unchanged and to be updated. -->
<xsl:for-each select="*">
  <xsl:choose>
    <xsl:when test="so:merge-key(.) = $merge-other/so:merge-key(.)">
      <xsl:apply-templates select="." mode="update">
        <xsl:with-param name="peer-updates" select="$merge-other[so:merge-key(.) = $compare-set/so:merge-key(.)]" />
      </xsl:apply-templates>
    </xsl:when>
    <xsl:otherwise>
      <xsl:apply-templates select="." />
    </xsl:otherwise>
  </xsl:choose>
</xsl:for-each>

In general xsl:for-each is ugly and bad. It is an xslt anti-pattern. If this was my production code, and I had more time to think about it, I would use a template matching mechanism instead. But for what it is worth, here is a quick and dirty solution anyway.

Sign up to request clarification or add additional context in comments.

3 Comments

Thanks Sean, this works! But as your indicated earlier in your questions, the ordering does get impacted. In a large xml file the ordering gets messy. Is there a way to keep the same ordering?
What are your ordering rules?
The ordering should remain intact as in file1. newly added nodes can be added as the last sibling.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.