9

How can I preserve entity references when transforming XML with XSLT (2.0)? With all of the processors I've tried, the entity gets resolved by default. I can use xsl:character-map to handle the character entities, but what about text entities?

For example, this XML:

<!DOCTYPE doc [
<!ENTITY so "stackoverflow">
<!ENTITY question "How can I preserve the entity reference when transforming with XSLT??">
]>
<doc>
  <text>Hello &so;!</text>
  <text>&question;</text>
</doc>

transformed with the following XSLT:

<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output indent="yes"/>
  <xsl:strip-space elements="*"/>

  <xsl:template match="node()|@*">
    <xsl:copy>
      <xsl:apply-templates select="node()|@*"/>
    </xsl:copy>
  </xsl:template>

</xsl:stylesheet>

produces the following output:

<doc>
   <text>Hello stackoverflow!</text>
   <text>How can I preserve the entity reference when transforming with XSLT??</text>
</doc>

The output should look like the input (minus the doctype declaration for now):

<doc>
  <text>Hello &so;!</text>
  <text>&question;</text>
</doc>

I'm hoping that I don't have to pre-process the input by replacing all ampersands with &amp; (like &amp;question;) and then post-process the output by replacing all &amp; with &.

Maybe this is processor specific? I'm using Saxon 9.

Thanks!

1
  • Good question, +1. The requested processing is almost impossible to do with XSLT and I wouldn't recommend using my answer frequently. Commented May 13, 2011 at 3:12

5 Answers 5

5

If you know what entities will be used and how they are defined, you can do the following (quite primitive and error-prone, but still better than nothing):

<xsl:stylesheet version="2.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
 xmlns:xs="http://www.w3.org/2001/XMLSchema"
 xmlns:my="my:my">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>

 <xsl:character-map name="mapEntities">
  <xsl:output-character character="&amp;" string="&amp;"/>
 </xsl:character-map>

 <xsl:variable name="vEntities" select=
 "'stackoverflow',
 'How can I preserve the entity reference when transforming with XSLT\?\?'
 "/>

 <xsl:variable name="vReplacements" select=
 "'&amp;so;', '&amp;question;'"/>

 <xsl:template match="node()|@*">
  <xsl:copy>
   <xsl:apply-templates select="node()|@*"/>
  </xsl:copy>
 </xsl:template>

 <xsl:template match="/">
  <xsl:text disable-output-escaping="yes"><![CDATA[<!DOCTYPE doc [ <!ENTITY so "stackoverflow">
<!ENTITY question
"How can I preserve the entity reference when transforming with XSLT??"> ]>
]]>
  </xsl:text>

  <xsl:apply-templates/>
 </xsl:template>

 <xsl:template match="text()">
  <xsl:value-of select=
  "my:multiReplace(.,
                   $vEntities,
                   $vReplacements,
                   count($vEntities)
                   )
  " disable-output-escaping="yes"/>
 </xsl:template>

 <xsl:function name="my:multiReplace">
  <xsl:param name="pText" as="xs:string"/>
  <xsl:param name="pEnts" as="xs:string*"/>
  <xsl:param name="pReps" as="xs:string*"/>
  <xsl:param name="pCount" as="xs:integer"/>

  <xsl:sequence select=
  "if($pCount > 0)
     then
      my:multiReplace(replace($pText,
                              $pEnts[1],
                              $pReps[1]
                              ),
                      subsequence($pEnts,2),
                      subsequence($pReps,2),
                      $pCount -1
                      )
      else
       $pText
  "/>
 </xsl:function>
</xsl:stylesheet>

when applied on the provided XML document:

<!DOCTYPE doc [ <!ENTITY so "stackoverflow">
<!ENTITY question
"How can I preserve the entity reference when transforming with XSLT??"> ]>
<doc>
    <text>Hello &so;!</text>
    <text>&question;</text>
</doc>

the wanted result is produced:

<!DOCTYPE doc [ <!ENTITY so "stackoverflow">
<!ENTITY question
"How can I preserve the entity reference when transforming with XSLT??"> ]>

  <doc>
      <text>Hello &so;!</text>
      <text>&question;</text>
</doc>

Do note:

  1. The special (RegEx) characters in the replacements must be escaped.

  2. We needed to resolve to DOE, which isn't recommended, because it violates the principles of the XSLT architecture and processing model -- in other words this solution is a nasty hack.

Sign up to request clarification or add additional context in comments.

1 Comment

Thank you very much Dimitre. I was afraid of this. Unfortunately I won't know what entities are being used. I think I will stick to OmniMark for this project. Your answer is very helpful though and I appreciate the time. +1 and answer accepted
3

This can be an especially troublesome issue if you are using something like S1000D. It uses entities and @boardno attributes to link to figures. It's a throwback to its SGML roots.

Because this automatic entity expanding behavior, which is correct but undesireable, I often have to drop back to tools like sed, awk and batch scripts to manage certain data analysis tasks when using S1000D as input.

IMHO, this would be a great change proposal to one of the upcoming XSLT specifications that a compliant processor accept a runtime parameter that can turn on and off entitiy expansions.

1 Comment

I mostly deal with ATA iSpec 2200 and have worked with S1000D some so I know exactly what you mean.
1

If you use a Java implementation of an XSLT 2.0 processor (like Saxon 9 Java) you might want to check whether http://andrewjwelch.com/lexev/ helps out, you can preprocess your XML with entity and character references that way to get them marked up as XML elements you can then transform as necessary.

Comments

1

I use this solution and it works well :

<xsl:variable name="prolog" select="substring-before(unparsed-text(document-uri(.)),'&lt;root')"/>

<xsl:template match="/">
    <xsl:value-of select="$prolog" disable-output-escaping="yes"/>
  <xsl:apply-templates/>
</xsl:template>

1 Comment

I haven't tried this, but it looks like it would only preserve the prolog; entity references would still be expanded. I could see using xsl:analyze-string to analyze the prolog and build up a structure (or map in 3.0) of key/value pairs and then replacing them during processing. I may try that one day (+1 for the idea). To actually solve this issue/question, I ended up writing an Omnimark program similar to the "lexev" java program mentioned in another answer.
0

You can keep EntityReference nodes in the document by using a DOM LS parser with "entities" parameter set to true. http://docs.oracle.com/javase/6/docs/api/org/w3c/dom/DOMConfiguration.html

The specification says the default value is true but depending on the parser, it could be false, be aware of that.

To load Xerces :

DOMImplementationLS domImpl = new org.apache.xerces.dom.CoreDOMImplementationImpl();

You can use registry as below too but personnaly, I would rather hardcode the implementation I want as above:

DOMImplementationRegistry registry = DOMImplementationRegistry.newInstance();
DOMImplementationLS domImpl = (DOMImplementationLS) registry.getDOMImplementation("XML 3.0 LS 3.0"); 

Then, to load your document :

// XML parser with XSD schema 
LSParser parser = domImpl.createLSParser(DOMImplementationLS.MODE_SYNCHRONOUS, "http://www.w3.org/2001/XMLSchema");
DOMConfiguration config = parser.getDomConfig();
config.setParameter("entities", true);
LSInput input = impl.createLSInput();
Document lDoc = parser.parse(your XML stream);

Then, your XML entities are not expanded in the DOM.

Then, because SAXON does not handle entities not expanded ('Unsupported node type in DOM! 5' error), you can not use net.sf.saxon.xpath.XPathFactoryImpl, you have to set the default XPathFactory of Xerces with XPathFactory.newInstance()

3 Comments

I tried using this approach but when using the document as a DOM source you'd get [Fatal Error] :xxx:yyy: Character reference "&#
Could you give more details with source code, XML inputs ?
I don't have it anymore actually I found an alternative way of handling my requirement which kept the entity data inside an attribute which does not get translated.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.