1

In Java 17 I'm using XPath to extract data from XML by joining all the <bar>s under <foo>. I'm using Saxon 12, but I'm doing it through the JAXP API. I create an XPathExpression and then invoke it like this:

(String)xpathExpression.evaluate(context, XPathConstants.STRING)

I was hoping that this would give me a null if there was no match. But apparently this is not the case. Let's start with this XPath expression (simplified from what I'm using)

/foo/string-join(bar, codepoints-to-string(10))

I wanted this to join all the /foo/bar strings together, separated by newlines, which it does if there is a /foo. But if there is no /foo, then instead of returning null it seems to return an empty string.

My first question would be how to detect that this XPath expression did not match /foo/. I had assumed that XPathExpression.evaluate() would return null if there was no match. (Reading the API now I guess that was just an assumption I made.)

But let's say that I'm OK with returning an empty string, and I can detect if the returned string is empty and consider that a non-match (even though semantically that is not ideal). The problem is that I want the value to end with a newline as well, so my expression looks like this:

concat(/foo/string-join(bar, codepoints-to-string(10)), codepoints-to-string(10))

This is worse—now if there is no /foo, it returns a string with a single newline \n, because it appends the newline to the thing-that-did-not-match which it considered the empty string.

I would prefer to find a way for this expression to return null in JAXP if /foo does not exist. But if that can't easily be done, I'd prefer to still at least get an empty string if /foo does not exist, i.e. concat() only appends text if the inner match is successful. I have a feeling I'll have to construct some elaborate work around, but maybe an XPath expert knows a trick or two.

1 Answer 1

3

When you use the JAXP interface with XPath 2.0, you run into the problem that the JAXP specification doesn't say what happens when the expression returns values outside the XPath 1.0 type system. So Saxon does its best to interpret the intent.

If there is no foo element then the XPath expression returns an empty sequence. JAXP says that the raw result is converted to the required return type using XPath conversion rules. Now, in XPath 2.0 the string() function applied to an empty sequence returns a zero-length string, while the xs:string() constructor returns an empty sequence, which one might (perhaps) interpret as equivalent to a Java null. But Saxon chooses the string() conversion and returns a zero length string.

My advice would be to switch to the s9api interface which gives you full access to the XPath 2.0 type system. I would probably write an expression that returns a sequence of strings, and write the code to convert this into a single string in Java rather than in XPath.

But if you want to stick with JAXP, you could use the XPath expression

string-join(/foo/(bar || '\n'))

(Note, the \n is converted to a newline by the Java compiler, not by the XPath engine).

Sign up to request clarification or add additional context in comments.

10 Comments

Thanks again for the quick answer and as usual excellent information. I have no problem with there being an s9api, and I am sure (as it has the experience of hindsight) that it is more elegant and faster than the JAXP interface. Nevertheless in my case, I am writing a general platform that will allow pluggable processing, and it may be that the processors want to manipulate the DOM, and for an industry-wide platform that I prefer to stick with the DOM for interoperability and for low learning curve, so I intend to parse the source documents to the W3C DOM, even if I us Saxon for XPath/XSLT.
"(Note, the \n is converted to a newline by the Java compiler, not by the XPath engine)." We can pass literal newlines to XPath? 😮 So I don't have to use codepoints-to-string(10)? I'll try that.
Yes, newline is a valid character in an XPath string literal.
I assume you're aware that Saxon will work with DOM whether you use the JAXP or the s9api interface? Saxon's TinyTree is 5-10 times faster, and considerably smaller, but if you want to use DOM then you can.
"I assume you're aware that Saxon will work with DOM whether you use the JAXP or the s9api interface?" Ah, no I didn't, thanks. But first let me try to wrap my head around this || '\n' stuff first. I'm not quite getting it to work yet in my (more complex) XPath expression.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.