0

I faced a strange problem working with XSLT and XML. I am creating an MVC application that reads an XSLT file containing templates and variables and process their contents. After the processing I saw that many changes were made to some nodes that would not have to be touched.
At some point in the XSLT file, I have a variable which content is

 <xsl:choose>
      <xsl:when test="@resCurrPage = 1">1</xsl:when>
      <xsl:when test="@resCurrPage > 4">3</xsl:when>
      <xsl:otherwise>2</xsl:otherwise>
 </xsl:choose>

but when the processing is finished the sedond <xsl:when> is transformed into
<xsl:when test="@resCurrPage &gt; 4">3</xsl:when>.
I figured it out that the transformation from > into &gt; seems to be happened when the

XmlDocument xDoc = new XmlDocument();
xDoc.LoadXml(templateFile);
XmlNodeList nodeList = xDoc["xsl:stylesheet"].ChildNodes;

is called. (templateFile is a string containing the XSLT file)
My question is why is this transformation happening? and how could I avoid it?

7
  • Because in XML attributes certain characters must be encoded and some are recommended to be encoded. > is one of those. Commented May 11, 2015 at 8:50
  • Can I somehow specify not to transform them ? Commented May 11, 2015 at 8:51
  • Not really, is there a specific reason why it shouldn't be transformed? Commented May 11, 2015 at 8:54
  • 1
    I would like to not touch the nodes that do not need to be transformed because after I save the transformed new xslt file, it would be much easier to compare with the original one. Currently I have modified just 12 nodes and I have 160 differences Commented May 11, 2015 at 8:57
  • 1
    That won't affect a comparison that isn't broken; since <el att=">"> and <el att="&gt;"> are exactly the same XML, if a comparison considers them different then the comparison is buggy and those bugs are going to cause some other problem anyway so needs to be fixed. Commented May 11, 2015 at 9:34

2 Answers 2

1

Based on your question title: XML specification says that < character must always be encoded unless it's marking the beginning of a tag.

Based on your question: the > character doesn't need to be encoded in attribute values etc, but it may be. When it is used in ]]> combination, it must be encoded.

Unfortunately there is no way to tell XmlDocument not to encode the value, and it is completely legal for it to do so. It could encode everything if it wanted. This makes comparing XML documents non-trivial, because you have to actually take the structure into account. But since XML is structural, this is needed anyway. So a simple diff type of comparison will never work without problems, since there can be different types of whitespace for indentation, newlines etc, which are not important for XML structure or contents, but will be noticed with a simple textual diff tool.

Sign up to request clarification or add additional context in comments.

1 Comment

Sad, but true. The actual comparison is made with total commander's comparer not through code. I also noticed t hath " is transformed into &quot; and ' into ". I hope in the future there will be solutions for these problems too. Thank you
0

XML has some set of unescape characters. Such as: ', ", & <, and > Cause when you write a node as

<xsl:when test="@resCurrPage > 4">3</xsl:when>

now this <xsl:when test="@resCurrPage > part becomes a node and obviously 4">3 becomes data which again contains >. So it is needed to be transformed into escape character.

The XML escape character list is:

' is replaced with &apos;
" is replaced with &quot;
& is replaced with &amp;
< is replaced with &lt;
> is replaced with &gt;

1 Comment

Actually that's incorrect. Inside quoted values > does not end the tag. It can be inside the quotes, but many implementations will still encode it.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.