Correct way to retrieve resulting text as string with Saxon XSLT library

Question

I'm using an XSLT 3 template with <xsl:output method="text"/>, which extracts some lines of text from an XML source document. The template is very particular, producing the individual lines and even the newlines (LF) in the right places.

Invoking the Saxon HE 12.2 JAR with Java 17 from the command line, I verify that the output text is precisely what I'm looking for, suitable for a .txt file.

The next step is to do the same thing programmatically, so I followed the documentation for using the s9api for transformations. Since I had used <xsl:output method="text"/> I assumed that an XSLT processor would output only text. Instead it appears that transformer.applyTemplates(new StreamSource(xmlInputStream)) will produce an XdmValue, itself which is a series of XdmItems.

Investigating further, it seems that each XdmItem wraps an XdrNode of kind TEXT! (I see that this mirrors the DOM's text nodes.) There is a text node for each output of the stylesheet, including a separate node for each newline which the output, e.g. from <xsl:text>
</xsl:text> in the template.

As I mentioned I had assumed that <xsl:output method="text"/> would have made the transformer skip the XML world altogether and simply output the text to a text buffer. I imagined some sort of produceText(String) method, similar to Hadoop MapReduce emitting values, which would be collected immediately to a buffer without the need to wrap them each in any sort of node. But I guess the XML foundation still presents itself to some extent, even in "text" output mode.

To me these nodes seem like needless overhead, as <xsl:output method="text"/> plainly indicates I don't need XML output at all. Maybe for historical reasons it's unavoidable. In any case, I understand that I can extract the text using this:

String text = xdmValue.stream().map(XdmItem::getStringValue).collect(joining());

My question is simply: is this the most efficient way to extract XSLT text output using Saxon, or is there a simpler, more direct way that skips the intermediate overhead of XdmNode items?

Michael Kay · Accepted Answer · 2023-06-17 07:51:37Z

2

If you're getting an XdmValue back, it means you're using an API method that delivers the transformation result as an XDM value rather than serializing it (which means output method="text" is ignored, because that's only used to control serialization).

Use a Serializer as the Destination for the transformation, and initialise the Serializer to write to a StringWriter; on completion call toString() on the Serializer to get the results as a string.

Incidentally you ask for "a simpler, more direct way". As far as the spec is concerned, the transformation produces a result tree (of XDM nodes), and then passes this to a serializer if requested. But yes, if the output is going to a serializer then Saxon internally will skip the construction of the result tree and write the text "nodes" to the serializer as they are generated.

answered Jun 17, 2023 at 7:51

Michael Kay

165k11 gold badges97 silver badges173 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Garret Wilson Over a year ago

Thanks for the wonderful explanation. Yes I had assumed that to some extent the XML-ness was inherent to the model, but I wanted to make sure I wasn't doing something silly that would unnecessarily add overhead, and it turns out I was. xsltTransformer.applyTemplates(new StreamSource(xmlInputStream), xslProcessor.newSerializer(stringWriter)) works just fine. I marked Martin's answer as correct because he was first and gave a link to the API, but I appreciate the added info you gave.

Garret Wilson Over a year ago

Note that the documentation says, "The XsltTransformer is geared towards the traditional way of running an XSLT transformation. … The output of the transformation is specified as a Destination object …. The Xslt30Transformer class … provides new ways of executing stylesheet code …. … [Y]ou can return the results in raw form…. … It is still possible to wrap the output in a result tree and send it to a Destination (which might be a Serializer), but this is no longer mandatory." From this I inferred that returning a value without a Serializer would provide me a more direct "raw" form.

Garret Wilson Over a year ago

I suppose the mistake in my inference was assuming that the "raw" form was more direct and had less overhead. I suppose the XDM nodes make up a more "raw" conceptualization of the data, but the transformer has to go to more work and create more data structure to create this "raw" representation. So perhaps here "raw" means the representation is closer to the conceptual transformation model, not that it represents the data directly from the transformer in an unprocessed form. To me "raw" means more the latter, which is what tripped me up.

Michael Kay Over a year ago

In XSLT 3.0 there are actually three forms of output. (a) If the stylesheet simply does <xsl:template match="/"><xsl:sequence select="count(//*)"/></xsl:template> then the raw result is an integer, say 42. (b) If you run the stylesheet in the same way as you would for XSLT 1.0 or 2.0, then a result tree is generated; this consists of a document node that owns a text node whose string value is "42" as a string. (c) You can pass the result tree to a serializer.

Martin Honnen · Accepted Answer · 2023-06-17 05:57:03Z

1

There is an overload of the applyTemplates method (https://www.saxonica.com/html/documentation12/javadoc/net/sf/saxon/s9api/Xslt30Transformer.html#applyTemplates(net.sf.saxon.s9api.XdmValue,net.sf.saxon.s9api.Destination) writing to a destination like a Serializer (over a stream or file or writer ) that I would suggest to use if you want Saxon to serialize the transformation result based on your xsl:output declarations.

edited Jun 17, 2023 at 5:57

answered Jun 16, 2023 at 23:02

Martin Honnen

169k6 gold badges100 silver badges122 bronze badges

1 Comment

Garret Wilson Over a year ago

Awesome. xsltTransformer.applyTemplates(new StreamSource(xmlInputStream), xslProcessor.newSerializer(stringWriter)) works perfectly. I had initially skipped the serializer code because the example was for writing to a file, and from the documentation it sounded like the internal "raw" form was more direct than using a Destination.

Collectives™ on Stack Overflow

Correct way to retrieve resulting text as string with Saxon XSLT library

2 Answers 2

4 Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

4 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related