0

I'm converting an embedded text pattern to an element.

Everything I seem to do add &lt;hit&gt; rather than <hit>.

Here is my input...

<?xml version="1.0" encoding="UTF-8"?>
<response>
   <lst>
      <lst id="7c5d14cd1225d94ff8dd9cf06eb67f05">
         <arr name="note">
            <str>
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Vivamus tellus risus,
iaculis id convallis sagittis, convallis in purus. Suspendisse imperdiet, enim
aliquam varius gravida, eros sapien mattis metus, vel suscipit lacus lacus eget
sem. Nulla ullamcorper enim quis dolor pellentesque consectetur quis quis eros.
Pellentesque et urna @@pre@@corn@@post@@, vel gravida urna. Nulla facilisi. Sed condimentum
purus non magna tristique molestie. Aliquam vulputate lobortis cursus. Morbi
tincidunt lobortis feugiat. Praesent in sapien diam. Aliquam bibendum elit ut
massa tristique tincidunt eu vel nisl. Aenean @@pre@@corn@@post@@ nulla id urna laoreet tempus.
Etiam ultricies lacus a arcu ornare iaculis. In eget tempus nisi.
        </str>
         </arr>
      </lst>
      <lst id="0e18352acb70ff7c4441dfd201dd7cd1">
         <arr name="note">
            <str>
Proin vitae eleifend enim. Nullam quis mauris ipsum. Proin sem dolor, placerat
nec ornare et, pulvinar a arcu. Cras varius venenatis sapien, eu sagittis sem
laoreet nec. Sed congue elit et magna tincidunt in ultrices quam rhoncus.
Maecenas ullamcorper pellentesque lobortis. Nunc blandit semper neque, vel
rhoncus tortor lacinia nec. Praesent sed feugiat @@pre@@corn@@post@@. Integer vel arcu leo,
sit amet volutpat diam. Cras posuere tristique est, ut tristique nibh
sollicitudin et. Pellentesque vitae justo sapien, non imperdiet velit.
        </str>
         </arr>
      </lst>
   </lst>
</response>

My result should be...

<?xml version="1.0" encoding="UTF-8"?>
<response>
   <lst>
      <lst id="7c5d14cd1225d94ff8dd9cf06eb67f05">
         <arr name="note">
            <str>
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Vivamus tellus risus,
iaculis id convallis sagittis, convallis in purus. Suspendisse imperdiet, enim
aliquam varius gravida, eros sapien mattis metus, vel suscipit lacus lacus eget
sem. Nulla ullamcorper enim quis dolor pellentesque consectetur quis quis eros.
Pellentesque et urna <hit>corn</hit>, vel gravida urna. Nulla facilisi. Sed condimentum
purus non magna tristique molestie. Aliquam vulputate lobortis cursus. Morbi
tincidunt lobortis feugiat. Praesent in sapien diam. Aliquam bibendum elit ut
massa tristique tincidunt eu vel nisl. Aenean <hit>corn</hit> nulla id urna laoreet tempus.
Etiam ultricies lacus a arcu ornare iaculis. In eget tempus nisi.
        </str>
         </arr>
      </lst>
      <lst id="0e18352acb70ff7c4441dfd201dd7cd1">
         <arr name="note">
            <str>
Proin vitae eleifend enim. Nullam quis mauris ipsum. Proin sem dolor, placerat
nec ornare et, pulvinar a arcu. Cras varius venenatis sapien, eu sagittis sem
laoreet nec. Sed congue elit et magna tincidunt in ultrices quam rhoncus.
Maecenas ullamcorper pellentesque lobortis. Nunc blandit semper neque, vel
rhoncus tortor lacinia nec. Praesent sed feugiat <hit>corn</hit>. Integer vel arcu leo,
sit amet volutpat diam. Cras posuere tristique est, ut tristique nibh
sollicitudin et. Pellentesque vitae justo sapien, non imperdiet velit.
        </str>
         </arr>
      </lst>
   </lst>
</response>

1 Answer 1

1

Well basically you would use analyze-string e.g.

<xsl:template match="arr/str">
  <xsl:copy>
    <xsl:analyze-string select="." regex="@@pre@@(.+)@@post@@">
      <xsl:matching-substring>
        <hit>
          <xsl:value-of select="regex-group(1)"/>
        </hit>
      </xsl:matching-substring>
      <xsl:non-matching-substring>
        <xsl:value-of select="."/>
      </xsl:non-matching-substring>
    </xsl:analyze-string>
  </xsl:copy>
</xsl:template>

[edit] I have changed the regular expression a bit to simply collect anything that is not a @ character, and here is a full stylesheet:

<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<xsl:template match="@* | node()">
  <xsl:copy>
    <xsl:apply-templates select="@* , node()"/>
  </xsl:copy>
</xsl:template>

<xsl:template match="arr/str">
  <xsl:copy>
    <xsl:analyze-string select="." regex="@@pre@@([^@]+)@@post@@">
      <xsl:matching-substring>
        <hit>
          <xsl:value-of select="regex-group(1)"/>
        </hit>
      </xsl:matching-substring>
      <xsl:non-matching-substring>
        <xsl:value-of select="."/>
      </xsl:non-matching-substring>
    </xsl:analyze-string>
  </xsl:copy>
</xsl:template>

</xsl:stylesheet>

That way you should get a better result if there are several patterns in one line. If it still does not work as you want then please post more detailed input data.

Sign up to request clarification or add additional context in comments.

4 Comments

That does not work for the case where there is more than one hit in a single line.
I have edited the code to address the problem you have raised.
One last thing. In my real use case, one instance of the content is "@@pre@@Corn@@post@@", but the output lowers the case to "<hit>corn</hit>". Is that solvable as well?
I can't see why that would happen, for any match of the regular expression pattern @@pre@@([^@]+)@@post@@ the code simply outputs what has been collected by ([^@]+) so Corn should be output as Corn. Which XSLT 2.0 processor are you using? I tested with Saxon 9.5 and the pattern Cras @@pre@@Corn@@post@@ posuere is transformed into Cras <hit>Corn</hit> posuere.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.