0

This is my test input:

<license>
     <p>some text (http://creativecommons.org/licenses/by/3.0/) some text.</p>
</license>

Desired output:

<license xlink:href="http://creativecommons.org/licenses/by/4.0/">
     <p>some text (http://creativecommons.org/licenses/by/3.0/) some text.</p> 
</license>

Basically I am trying to copy the url inside the text where license element does not contain the attribute xlink:href="http:// ******"> by looking in child <license-p> and move any URL up to the xlink:href attribute on the parent (license)

and here is my xslt:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:xlink="http://www.w3.org/1999/xlink"

exclude-result-prefixes="xs"
version="3.0"> 
    <xsl:output method="html" encoding="UTF-8" indent="yes" />
    <xsl:strip-space elements="*"/>
    <xsl:template match="@*|node()">
        <xsl:copy>
            <xsl:apply-templates select="@*|node()"/>
        </xsl:copy>
    </xsl:template>
    <xsl:template match="license">
          <xsl:copy>
            <xsl:attribute name="xlink:href">                    
                <xsl:value-of select='replace(p,"[\s\S]*" ,"(\b(?:(?:https?|ftp):\/\/|www\.|ftp\.)(?:\([-A-Z0-9+&amp;@#\/%=~_|$?!:,.]*\)|[-A-Z0-9+&amp;@#\/%=~_|$?!:,.])*(?:\([-A-Z0-9+&amp;@#\/%=~_|$?!:,.]*\)|[A-Z0-9+&amp;@#\/%=~_|$]))")'/>
            </xsl:attribute> 
            <xsl:apply-templates/>
        </xsl:copy>
    </xsl:template>
    <xsl:template match="p/@xlink:href"/>   
</xsl:stylesheet>

The regex I am using is not working for saxon owing characters like?

6
  • 1
    What is the purpose of the replace() here? Commented Jul 22, 2015 at 2:40
  • I am allowed to use 3 functions with regex expressions. matches(), replace() and tokenize(). The purpose of replace() is to extract the uri from the whole text by replacing entire text content with uri. matches() returns true or false. and tokenize function splits a string based on a regular expression. I can also use analyze-string() instead of replace() Commented Jul 22, 2015 at 8:35
  • If you want to extract a certain substring matching a regular expression then you should consider using xsl:analyze-string instead of replace, see w3.org/TR/xslt20/#analyze-string Commented Jul 22, 2015 at 8:36
  • Also in the sample the URI is wrapped into (), can we assume that that is always the case? Commented Jul 22, 2015 at 8:40
  • No, I am afraid we cannot assume that the URI will always be wrapped into () Commented Jul 22, 2015 at 8:57

1 Answer 1

1

Ok folks, I know regex is far from perfect but the following works for me:

<xsl:analyze-string 
    select="$elValue"
    regex="((https?|ftp|gopher|telnet|file):(()|(\\\\))+[\\w\\d:#@%/;$()~_?\\+-=\\\\\\.&amp;]*\w*.\w*\W\w*\W\w*\W\d.\d\W)">                    
        <xsl:matching-substring>
            <xsl:value-of select="regex-group(1)"/>                       
        </xsl:matching-substring>
</xsl:analyze-string>
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.