This is my test input:
<license>
<p>some text (http://creativecommons.org/licenses/by/3.0/) some text.</p>
</license>
Desired output:
<license xlink:href="http://creativecommons.org/licenses/by/4.0/">
<p>some text (http://creativecommons.org/licenses/by/3.0/) some text.</p>
</license>
Basically I am trying to copy the url inside the text where license element does not contain the attribute xlink:href="http:// ******"> by
looking in child <license-p> and move any URL up to the xlink:href attribute on the parent (license)
and here is my xslt:
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:xlink="http://www.w3.org/1999/xlink"
exclude-result-prefixes="xs"
version="3.0">
<xsl:output method="html" encoding="UTF-8" indent="yes" />
<xsl:strip-space elements="*"/>
<xsl:template match="@*|node()">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="license">
<xsl:copy>
<xsl:attribute name="xlink:href">
<xsl:value-of select='replace(p,"[\s\S]*" ,"(\b(?:(?:https?|ftp):\/\/|www\.|ftp\.)(?:\([-A-Z0-9+&@#\/%=~_|$?!:,.]*\)|[-A-Z0-9+&@#\/%=~_|$?!:,.])*(?:\([-A-Z0-9+&@#\/%=~_|$?!:,.]*\)|[A-Z0-9+&@#\/%=~_|$]))")'/>
</xsl:attribute>
<xsl:apply-templates/>
</xsl:copy>
</xsl:template>
<xsl:template match="p/@xlink:href"/>
</xsl:stylesheet>
The regex I am using is not working for saxon owing characters like?
replace()here?xsl:analyze-stringinstead ofreplace, see w3.org/TR/xslt20/#analyze-string(), can we assume that that is always the case?