2

I need to clean up a large XML file after localization. The segments that did not need to be translated were replaced with a placeholder and then in the output were replaced with nothing. However, the surrounding tags remained as regexing all potential tags surrounding those now-missing content appeared to be too complex and dirty too.

When a transformation scenario is applied, there are a lot of blank tables, lines, etc. that are remnant XML elements of the deleted content. I need all those empty tags and their empty children too to go regardless whether they have attributes or not. I was able to find the following solution, however it does mention that it will only work for only elements without attributes without taking care of the elements with children (also empty). What adjustments are required for it to work for all empty elements even with attributes containing values? Any ideas would be appreciated.

<xsl:stylesheet version="1.0" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:strip-space elements="*"/>

<xsl:template match="*[descendant::text() or descendant-or-self::*/@*[string()]]">
    <xsl:copy>
        <xsl:apply-templates select="node()|@*"/>
    </xsl:copy>
</xsl:template>

<xsl:template match="@*[string()]">
    <xsl:copy/>
</xsl:template>

</xsl:stylesheet>

1 Answer 1

3

Whether it is necessary to cut off white-spaces from a string?

Here example with normalize-space: https://xsltfiddle.liberty-development.net/nbUY4kx/4

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
    <xsl:template match="@*|node()">
          <xsl:if test="normalize-space(.)!=''">
            <xsl:copy>
                <xsl:apply-templates select="@*|node()"/>
            </xsl:copy>    
          </xsl:if>
    </xsl:template>
</xsl:stylesheet>

and another example without: https://xsltfiddle.liberty-development.net/nbUY4kx/3

Sign up to request clarification or add additional context in comments.

6 Comments

Thank you, Pavel! I see that in the example without normalization, the element <issuetype> persists. Does this happen due to the whitespaces characters present after the child elements deletion? Is there a way to keep indents yet make sure that all empty elements including those that have whitespace characters deleted?
Add <xsl:output indent="yes"/> to the stylesheet, in order to pretty-print. xsltfiddle.liberty-development.net/nbUY4kx/5
Ilia, yes, this happen due to the whitespaces. To keep indents, you need to add an <xsl:output indent="yes"/>, as Mads Hansen suggested
@chertkov-pavel, this has partially worked, thank you for confirming. However, after this clean-up somehow apache FOP (FO to PDF transformation) is reporting multiple errors when processing table elements. Tried fixing them manually, taking too long. What is the way to exclude all table elements (opening and closing for <table>, <tbody>, <td>, and <tr>) from this cleanup XSLT? I will later manually regex them out one by one based on the ones that show up empty in the output PDF. Thanks again!
@Ilia, try it like this xsltfiddle.liberty-development.net/nbUY4kx/6 this transformation will replace every table tag, except the root tags. Indents are gone :( and I do not know how to return them. :p
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.