0

How to create an xslt (preferably 1.0) transformation providing a description of differences based on an id element. Input files are supposed to stick to the same format and contain items having several child elements. One of the child elements is an id. Compared should be values of elements with same id's. Input formats does not use attributes. The result of the transformation should describe type of differences with attributes as in the example below:

Old File:

<document>
    <item>
        <id>1</id>
        <element1>value1</element1>
        <element2>value2</element2>
    </item>
    <item>
        <id>2</id>
        <element1>value3</element1>
        <element2>value4</element2>
    </item>
    <item>
        <id>3</id>
        <element1>value5</element1>
        <element2>value6</element2>
    </item>
</document>

New File:

<document>
    <item>
        <id>1</id>
        <element1>value1</element1>
        <element2>other_value</element2>
    </item>
    <item>
        <id>2</id>
        <element1>value3</element1>
        <element2>value4</element2>
    </item>
    <item>
        <id>4</id>
        <element1>value7</element1>
        <element2>value8</element2>
    </item>
<document>

Result File:

<document>
    <item >
        <id>1</id>
        <element1>value1</element1>
        <element2 diff="changed" old="value2">other_value</element2>
    </item>
    <item>
        <id>2</id>
        <element1>value3</element1>
        <element2>value4</element2>
    </item>
    <item diff="removed">
        <id>3</id>
        <element1>value5</element1>
        <element2>value6</element2>
    </item>
    <item diff="added">
        <id>4</id>
        <element1>value7</element1>
        <element2>value8</element2>
    </item>
</document>

The solution should not be limited to specific set of child elements.

3
  • Please also specify whether one item element have multiple element1 children. Commented May 26, 2022 at 15:42
  • I am not convinced XSLT is the best tool for this. Commented May 26, 2022 at 16:57
  • there may be multiple elements within item (element1, element2 ..), but same sets of them in both input files and they have unique names i.e one item cannot have two element1 children Commented May 26, 2022 at 17:31

2 Answers 2

1

This is very awkward to do in XSLT, esp. in version 1.0.

The following stylesheet will work for your example. It is assumed that if a corresponding item exists in the new file, then both items have exactly the same child elements (though not necessarily with the same values), with unique names.

As I mentioned in the comments, using a dedicated diff tool would probably be a better choice.

XSLT 1.0

<xsl:stylesheet version="1.0" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:strip-space elements="*"/>

<xsl:param name="new-doc" select="document('new.xml')/document"/>

<xsl:template match="/document">
    <xsl:copy>
        <xsl:apply-templates select="item"/>
        <xsl:apply-templates select="$new-doc/item[not(id=current()/item/id)]" mode="add"/> 
    </xsl:copy>
</xsl:template>

<xsl:template match="item">
    <xsl:variable name="new-item" select="$new-doc/item[id=current()/id]" />
    <xsl:choose>
        <xsl:when test="not($new-item)">
            <item diff="removed">
                <xsl:copy-of select="*"/>
            </item>
        </xsl:when>
        <xsl:otherwise>
            <xsl:copy>
                <xsl:apply-templates/>  
            </xsl:copy>
        </xsl:otherwise>
    </xsl:choose>   
</xsl:template>

<xsl:template match="item" mode="add">
     <item diff="added">
        <xsl:copy-of select="*"/>
    </item>
</xsl:template>

<xsl:template match="item/*">
    <xsl:variable name="new-elem" select="$new-doc/item/*[../id=current()/../id and name()=name(current())]" />
    <xsl:choose>
        <xsl:when test=". = $new-elem">
            <xsl:copy-of select="."/>
        </xsl:when>
        <xsl:otherwise>
            <xsl:copy>
                <xsl:attribute name="diff">changed</xsl:attribute>
                <xsl:attribute name="old">
                    <xsl:value-of select="." />
                </xsl:attribute>
                <xsl:value-of select="$new-elem" />
            </xsl:copy>
        </xsl:otherwise>
    </xsl:choose>   
</xsl:template>

</xsl:stylesheet>
Sign up to request clarification or add additional context in comments.

2 Comments

Works like a charm, thank you! I assume that what you suggest is some xml understanding diff tool that compares data ignoring elements order or formatting. My goal is not just to have files compared. I need the result of the comparison in an xml format that can be processed by other tools downstream to create excel table from it with those changes highlighted.
I have not explored the field, but searching for "xml diff" should get you some options.
1

I'll start with XSLT 2.0 and leave you to look at how it might be adapted to 1.0.

First, start with grouping:

<xsl:for-each-group select="$doc1/item, $doc2/item" group-by="id">
  ...
</xsl:for-each-group>

Within the body:

  • if count(current-group()) = 1, the ID exists in only one file; you can work out which by testing (root(current-group()) is $doc1)

  • otherwise (the ID is present in both files), it rather depends on the set of possible differences you want to cater for. You've provided an example, but an example is not the same as a specification. If we assume that all the children of item are elements in the form of your example (<E>value</E>) and that each such element appears at most once, then you could do a further grouping of current-group()/*[not(self::id)] grouped by node-name(.), and:

** if the current-group() has two elements, compare their values using "=" or deep-equal()`

** if it has only one element, report that as a difference.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.