1

In Oxygen XML Editor 27.0, using the "XPath/XQuery Builder" (which, as far as I know, makes use of Saxon as XPath/XQuery processor), when I execute the XPath 2.0 query encode-for-uri('§'), I get %C2%A7 as a result. Where does the %C2 come from?

Encoding other "special characters" like $, (, | and so on, I get only the respective hexadecimal ASCII code (i.e. just one, not two - for instance: | => %7C).

Why is this different with §?

2
  • Spec is w3.org/TR/xpath-functions-31/#func-encode-for-uri, which refers to ietf.org/rfc/rfc3986.txt which I think refers to first get the UTF-8 encoding (which for § is two bytes C2 and A7). Do you have any other XPath 2 or 3 implementation giving you a different result? Which is the one you expect? Commented Mar 3 at 18:22
  • Thank you, @MartinHonnen - I wasn't aware! No, I have only seen this with the described implementation and mentioned it in case it would be helpful for finding an explanation. But you pointed out the answer already. :) Commented Mar 3 at 18:27

1 Answer 1

2

From fn:encode-for-uri:


Like the fn:escape-html-uri and fn:iri-to-uri functions, this function replaces each special character with an escape sequence in the form %xx, where xx is two hexadecimal digits (in uppercase) that represent the character in UTF-8. For example, édition.html is changed to %C3%A9dition.html, with the é escaped as %C3%A9.

Hence, § (U+00A7, Section Sign) is encoded as %C2%A7

Sign up to request clarification or add additional context in comments.

1 Comment

And the reason it's different for ASCII characters like ` | => %7C` is that for ASCII characters, the UTF-8 encoding is one byte (in fact, it's the same as the ASCII encoding).

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.