0

This a follow-up from this stackoverflow question - Remove Duplicate Record from XML file using XLST

When using the online XSLT Test Tool (http://xslttest.appspot.com) the solution provided for this question works as expected. However when I implement the xslt into a shell script I receive the following error:

XPath error : Invalid expression
.[generate-id()=generate-id(key('DistinctEAN', @vchEAN)[1])]
 ^
compilation error: file titles_isbn.xsl line 15 element copy-of
xsl:copy-of : could not compile select expression '.[generate-id()=generate-id(key('DistinctEAN', @vchEAN)[1])]'

I do not understand why the xslt works fine when it is used in the XSLT Online Test Tool but not when it is used in a shell script format. Here is my shell script:

#!/bin/sh
echo "Renaissance Duplicate Filter Removal Script Start...."

cd /var/process/renaissance/scripts

xsltproc titles_isbn.xsl /var/process/renaissance/extractedfiles/titles_isbn_test.xml -o /var/process/renaissance/rrin/titles_isbn_nodup.xml

echo "Renaissance Duplicate Filter Removal Script complete"

Here is the titles_isbn.xsl:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>
 <xsl:strip-space elements="*"/>

 <xsl:key name="DistinctEAN" match="z:row" xmlns:rs="urn:schemas-microsoft-com:rowset" xmlns:z="RowsetSchema" use="@vchEAN" />

 <xsl:template match="node()|@*">
     <xsl:copy>
       <xsl:apply-templates select="node()|@*"/>
     </xsl:copy>
 </xsl:template>

   <xsl:template match="z:row" xmlns:rs="urn:schemas-microsoft-com:rowset" xmlns:z="RowsetSchema" >
   <xsl:copy-of select=".[generate-id()=generate-id(key('DistinctEAN', @vchEAN)[1])]"/>
   </xsl:template>
</xsl:stylesheet>

Any help would be much appreciated.

1
  • Version of XSLT supported by the tools differs? And that stylesheet depends on a newer version than your xsltproc version supports? Commented Oct 27, 2015 at 20:27

2 Answers 2

1

I think XPath 1.0 has a quirk not allowing a predicate after the dot . so use current() instead of the dot.

As an alternative, simply put the negated condition in the match pattern of an empty template:

<xsl:template match="z:row[not(generate-id() = generate-id(key('DistinctEAN', @vchEAN)[1]))]" xmlns:rs="urn:schemas-microsoft-com:rowset" xmlns:z="RowsetSchema"/>

Sign up to request clarification or add additional context in comments.

2 Comments

Thanks for the recommendation, your solution works great on my smaller files. Unfortunately my large file(1.3 million records) receives a Out of Memory error. So i am looking at alternatives to processing that file.
XSLT (1.0 and 2.0 at least) work on a complete in-memory tree representation of the XML input document, I think implementers of XSLT processor say that the size of such a tree needs four or five times the size of the XML input. I don't know how much memory your machine has and whether xsltproc allows you to allocate more than it usual does, that is what helps with many Java based XSLT processors. I think it is better to ask a new question about the memory problem or, if you can, try to use approaches like saxonica.com/html/documentation/sourcedocs/streaming (requires a license).
1

Consider a standard Muenchian Grouping to remove duplicate records, compliant on most XSLT 1.0 processors.

Since I do not know your XML document structure, enter the placeholders for the grouping and matching nodes:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>
 <xsl:strip-space elements="*"/>

<xsl:key name="DistinctEAN" match="[ENTERING MATCHING NODE]" use="@vchEAN" />

  <!-- IdentityTransform -->
  <xsl:template match="@*|node()">
    <xsl:copy>
      <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
  </xsl:template>

  <xsl:template match="[ENTER GROUPING NODE]">
    <[ENTER GROUPING NODE]>    
    <xsl:for-each select="ENTER MATCHING NODE[generate-id()    
                         = generate-id(key('DistinctEAN', @vchEAN)[1])]">
      <[ENTER MATCHING NODE]>
        <xsl:copy-of select="*"/>
      </[ENTER MATCHING NODE]>
    </xsl:for-each>    
    </[ENTERING GROUPING NODE]>
  </xsl:template>

</xsl:stylesheet>

1 Comment

My final file to process is a very large file(1.3 million records) When I attempt to run my current xslt process(as shown above with Martins fix recommendation) the Linux op kills the process due to out of memory error. It is my understanding that xslt utilizes the DOM so the memory footprint is going to be a problem even if I attempt to use a Muenchen grouping. Would you agree that even a Muenchen grouping would not solve the memory issues for this xslt process due to the size of the input xml file I am trying to process(1.3 million)?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.