XSLT not compiling in Shell script

Question

This a follow-up from this stackoverflow question - Remove Duplicate Record from XML file using XLST

When using the online XSLT Test Tool (http://xslttest.appspot.com) the solution provided for this question works as expected. However when I implement the xslt into a shell script I receive the following error:

XPath error : Invalid expression
.[generate-id()=generate-id(key('DistinctEAN', @vchEAN)[1])]
 ^
compilation error: file titles_isbn.xsl line 15 element copy-of
xsl:copy-of : could not compile select expression '.[generate-id()=generate-id(key('DistinctEAN', @vchEAN)[1])]'

I do not understand why the xslt works fine when it is used in the XSLT Online Test Tool but not when it is used in a shell script format. Here is my shell script:

#!/bin/sh
echo "Renaissance Duplicate Filter Removal Script Start...."

cd /var/process/renaissance/scripts

xsltproc titles_isbn.xsl /var/process/renaissance/extractedfiles/titles_isbn_test.xml -o /var/process/renaissance/rrin/titles_isbn_nodup.xml

echo "Renaissance Duplicate Filter Removal Script complete"

Here is the titles_isbn.xsl:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>
 <xsl:strip-space elements="*"/>

 <xsl:key name="DistinctEAN" match="z:row" xmlns:rs="urn:schemas-microsoft-com:rowset" xmlns:z="RowsetSchema" use="@vchEAN" />

 <xsl:template match="node()|@*">
     <xsl:copy>
       <xsl:apply-templates select="node()|@*"/>
     </xsl:copy>
 </xsl:template>

   <xsl:template match="z:row" xmlns:rs="urn:schemas-microsoft-com:rowset" xmlns:z="RowsetSchema" >
   <xsl:copy-of select=".[generate-id()=generate-id(key('DistinctEAN', @vchEAN)[1])]"/>
   </xsl:template>
</xsl:stylesheet>

Any help would be much appreciated.

Version of XSLT supported by the tools differs? And that stylesheet depends on a newer version than your xsltproc version supports? — Etan Reisner
– Etan Reisner, Commented Oct 27, 2015 at 20:27

Martin Honnen · Accepted Answer · 2015-10-27 21:21:44Z

1

I think XPath 1.0 has a quirk not allowing a predicate after the dot . so use current() instead of the dot.

As an alternative, simply put the negated condition in the match pattern of an empty template:

<xsl:template match="z:row[not(generate-id() = generate-id(key('DistinctEAN', @vchEAN)[1]))]" xmlns:rs="urn:schemas-microsoft-com:rowset" xmlns:z="RowsetSchema"/>

edited Oct 27, 2015 at 21:21

answered Oct 27, 2015 at 20:31

Martin Honnen

169k6 gold badges100 silver badges123 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

R Ryff Over a year ago

Thanks for the recommendation, your solution works great on my smaller files. Unfortunately my large file(1.3 million records) receives a Out of Memory error. So i am looking at alternatives to processing that file.

Martin Honnen Over a year ago

XSLT (1.0 and 2.0 at least) work on a complete in-memory tree representation of the XML input document, I think implementers of XSLT processor say that the size of such a tree needs four or five times the size of the XML input. I don't know how much memory your machine has and whether xsltproc allows you to allocate more than it usual does, that is what helps with many Java based XSLT processors. I think it is better to ask a new question about the memory problem or, if you can, try to use approaches like saxonica.com/html/documentation/sourcedocs/streaming (requires a license).

Parfait · Accepted Answer · 2015-10-27 20:59:00Z

1

Consider a standard Muenchian Grouping to remove duplicate records, compliant on most XSLT 1.0 processors.

Since I do not know your XML document structure, enter the placeholders for the grouping and matching nodes:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>
 <xsl:strip-space elements="*"/>

<xsl:key name="DistinctEAN" match="[ENTERING MATCHING NODE]" use="@vchEAN" />

  <!-- IdentityTransform -->
  <xsl:template match="@*|node()">
    <xsl:copy>
      <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
  </xsl:template>

  <xsl:template match="[ENTER GROUPING NODE]">
    <[ENTER GROUPING NODE]>    
    <xsl:for-each select="ENTER MATCHING NODE[generate-id()    
                         = generate-id(key('DistinctEAN', @vchEAN)[1])]">
      <[ENTER MATCHING NODE]>
        <xsl:copy-of select="*"/>
      </[ENTER MATCHING NODE]>
    </xsl:for-each>    
    </[ENTERING GROUPING NODE]>
  </xsl:template>

</xsl:stylesheet>

answered Oct 27, 2015 at 20:59

Parfait

108k19 gold badges103 silver badges138 bronze badges

1 Comment

R Ryff Over a year ago

My final file to process is a very large file(1.3 million records) When I attempt to run my current xslt process(as shown above with Martins fix recommendation) the Linux op kills the process due to out of memory error. It is my understanding that xslt utilizes the DOM so the memory footprint is going to be a problem even if I attempt to use a Muenchen grouping. Would you agree that even a Muenchen grouping would not solve the memory issues for this xslt process due to the size of the input xml file I am trying to process(1.3 million)?

Collectives™ on Stack Overflow

XSLT not compiling in Shell script

2 Answers 2

2 Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related