Why does fn:encode-for-uri('§') result in %C2%A7 rather than just %A7?

Question

In Oxygen XML Editor 27.0, using the "XPath/XQuery Builder" (which, as far as I know, makes use of Saxon as XPath/XQuery processor), when I execute the XPath 2.0 query encode-for-uri('§'), I get %C2%A7 as a result. Where does the %C2 come from?

Encoding other "special characters" like $, (, | and so on, I get only the respective hexadecimal ASCII code (i.e. just one, not two - for instance: | => %7C).

Why is this different with §?

Spec is w3.org/TR/xpath-functions-31/#func-encode-for-uri, which refers to ietf.org/rfc/rfc3986.txt which I think refers to first get the UTF-8 encoding (which for § is two bytes C2 and A7). Do you have any other XPath 2 or 3 implementation giving you a different result? Which is the one you expect? — Martin Honnen
– Martin Honnen, Commented Mar 3 at 18:22
Thank you, @MartinHonnen - I wasn't aware! No, I have only seen this with the described implementation and mentioned it in case it would be helpful for finding an explanation. But you pointed out the answer already. :) — Philipp Koch
– Philipp Koch, Commented Mar 3 at 18:27

JosefZ · Accepted Answer · 2025-03-03 18:29:20Z

2

From fn:encode-for-uri:

…
Like the fn:escape-html-uri and fn:iri-to-uri functions, this function replaces each special character with an escape sequence in the form %xx, where xx is two hexadecimal digits (in uppercase) that represent the character in UTF-8. For example, édition.html is changed to %C3%A9dition.html, with the é escaped as %C3%A9.
…

Hence, § (U+00A7, Section Sign) is encoded as %C2%A7 …

answered Mar 3 at 18:29

JosefZ

30.5k6 gold badges52 silver badges96 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Michael Kay Mar 3 at 22:48

And the reason it's different for ASCII characters like ` | => %7C` is that for ASCII characters, the UTF-8 encoding is one byte (in fact, it's the same as the ASCII encoding).

Collectives™ on Stack Overflow

Why does fn:encode-for-uri('§') result in %C2%A7 rather than just %A7?

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related