4

So I got this situation which sucks. I have an XML like this


<table border="1" cols="200 100pt 200">
  <tr>
    <td>isbn</td>
    <td>title</td>
    <td>price</td>
  </tr>
  <tr>
    <td />
    <td />
    <td>
      <span type="champsimple" id="9b297fb5-d12b-46b1-8899-487a2df0104e" categorieid="a1c70692-0427-425b-983c-1a08b6585364" champcoderef="01f12b93-b4c5-401b-9da1-c9385d77e43f">
        [prénom]
      </span>
      <span type="champsimple" id="e103a6a5-d1be-4c34-8a54-d234179fb4ea" categorieid="a1c70692-0427-425b-983c-1a08b6585364" champcoderef="01f12b93-b4c5-401b-9da1-c9385d77e43f">[nom]</span>
      <span></span>
    </td>
  </tr>
  <tr></tr>
  <tr>
    <td></td>
    <td>Phill It in</td>
  </tr>
  <tr>
    <table id="cas1">
      <tr>
        <td ></td>
        <td >foo</td>
      </tr>
      <tr>
        <td >bar</td>
        <td >boo</td>
      </tr>
    </table>
  </tr>
  <tr>
    <table id="cas2">
      <tr>
        <td ></td>
        <td >foo</td>
      </tr>
      <tr>
        <td ></td>
        <td >boo</td>
      </tr>
    </table>
  </tr>
  <tr>
    <table id="cas3">
      <tr>
        <td >bar</td>
        <td ></td>
      </tr>
      <tr>
        <td >foo</td>
        <td >boo</td>
      </tr>
    </table>
  </tr>
  <tr>
    <table id="cas4">
      <tr>
        <td />
        <td />
      </tr>
      <tr>
        <td>foo</td>
        <td>boo</td>
      </tr>
    </table>
  </tr>
  <table id="cas4">
    <tr>
      <td />
      <td />
    </tr>
    <tr>
      <td>foo</td>
      <td>boo</td>
    </tr>
  </table>
  <tr>
    <td />
    <td />
  </tr>
</table>


Now the question is how would I recursively delete all empty td, tr and table elements?

Now I use this XSLT


<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output omit-xml-declaration="yes" indent="yes"/>
  <xsl:strip-space elements="*" />

  <xsl:template match="node()|@*">
    <xsl:copy>
      <xsl:apply-templates select="node()|@*"/>
    </xsl:copy>
  </xsl:template>

  <xsl:template match="td[not(node())]" />
  <xsl:template match="tr[not(node())]" />
  <xsl:template match="table[not(node())]" />

</xsl:stylesheet>


But it doesn't do very well. After I delete td, a tr becomes empty but it doesn't handle that. Too bad. See the table element with "cas4".


<table border="1" cols="200 100pt 200">
  <tr>
    <td>isbn</td>
    <td>title</td>
    <td>price</td>
  </tr>
  <tr>
    <td>
      <span type="champsimple" id="9b297fb5-d12b-46b1-8899-487a2df0104e" categorieid="a1c70692-0427-425b-983c-1a08b6585364" champcoderef="01f12b93-b4c5-401b-9da1-c9385d77e43f">
        [prénom]
      </span>
      <span type="champsimple" id="e103a6a5-d1be-4c34-8a54-d234179fb4ea" categorieid="a1c70692-0427-425b-983c-1a08b6585364" champcoderef="01f12b93-b4c5-401b-9da1-c9385d77e43f">[nom]</span>
      <span />
    </td>
  </tr>
  <tr>
    <td>Phill It in</td>
  </tr>
  <tr>
    <table id="cas1">
      <tr>
        <td>foo</td>
      </tr>
      <tr>
        <td>bar</td>
        <td>boo</td>
      </tr>
    </table>
  </tr>
  <tr>
    <table id="cas2">
      <tr>
        <td>foo</td>
      </tr>
      <tr>
        <td>boo</td>
      </tr>
    </table>
  </tr>
  <tr>
    <table id="cas3">
      <tr>
        <td>bar</td>
      </tr>
      <tr>
        <td>foo</td>
        <td>boo</td>
      </tr>
    </table>
  </tr>
  <tr>
    <table id="cas4">
      <tr />
      <tr>
        <td>foo</td>
        <td>boo</td>
      </tr>
    </table>
  </tr>
  <table id="cas4">
    <tr />
    <tr>
      <td>foo</td>
      <td>boo</td>
    </tr>
  </table>
  <tr />
</table>


How would you solve this problem?

1
  • Good question! Anyway, deleting all empty <td> nodes might be over the top - they are necessary for keeping a sensible table structure. Commented Apr 15, 2010 at 15:28

3 Answers 3

4

It sounds like your definition of empty is "contains no text or only whitespace". Is this the case? If so, the following transformation should do the trick:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> 
  <xsl:output omit-xml-declaration="yes" indent="yes"/> 
  <xsl:strip-space elements="*" /> 

  <xsl:template match="node()|@*"> 
    <xsl:copy> 
      <xsl:apply-templates select="node()|@*"/> 
    </xsl:copy> 
  </xsl:template> 

  <xsl:template match="td[not(normalize-space(.))]" /> 
  <xsl:template match="tr[not(normalize-space(.))]" /> 
  <xsl:template match="table[not(normalize-space(.))]" /> 
</xsl:stylesheet> 
Sign up to request clarification or add additional context in comments.

1 Comment

+1 that's clean and pragmatic. Note that normalize-space(.) is equivalent to normalize-space()
1

There is your solution:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output method="xml" version="1.0" encoding="utf-8" indent="yes"/>

    <xsl:template match="node()">
        <xsl:copy>
            <xsl:apply-templates select="@* | node()" />
        </xsl:copy>
    </xsl:template>

    <xsl:template match="@* | text()">
        <xsl:copy />
    </xsl:template>

    <xsl:template match="table | tr | td">

        <!-- result of the transformation of descendants -->
        <xsl:variable name="content">
            <xsl:apply-templates select="node()" />
        </xsl:variable>

        <!-- if there are any children left then copy myself -->
        <xsl:if test="count($content/node()) > 0">
            <xsl:copy>
                <xsl:apply-templates select="@*" />
                <xsl:copy-of select="$content" />
            </xsl:copy>
        </xsl:if>

    </xsl:template>

</xsl:stylesheet>

The idea is simple. I will do the transformation for my descendants first and then I will look if there is anyone left. If so I will copy myself and the result of the transformation.

If you want to preserve the table structure and remove only empty rows - elements <tr> that contains only empty elements <td>, than just create similar template for <tr> with different condition and ignore elements <td>.

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output method="xml" version="1.0" encoding="utf-8" indent="yes"/>

    <xsl:template match="node()">
        <xsl:copy>
            <xsl:apply-templates select="@* | node()" />
        </xsl:copy>
    </xsl:template>

    <xsl:template match="@* | text()">
        <xsl:copy />
    </xsl:template>

    <xsl:template match="table">

        <!-- result of the transformation of descendants -->
        <xsl:variable name="content">
            <xsl:apply-templates select="node()" />
        </xsl:variable>

        <!-- if there are any children left then copy myself -->
        <xsl:if test="count($content/node()) > 0">
            <xsl:copy>
                <xsl:apply-templates select="@*" />
                <xsl:copy-of select="$content" />
            </xsl:copy>
        </xsl:if>

    </xsl:template>

    <xsl:template match="tr">

        <!-- result of the transformation of descendants -->
        <xsl:variable name="content">
            <xsl:apply-templates select="node()" />
        </xsl:variable>

        <!-- number of non-empty td elements -->
        <xsl:variable name="cellCount">
            <xsl:value-of select="count($content/td[node()])" />
        </xsl:variable>

        <!-- number of other elements -->
        <xsl:variable name="elementCount">
            <xsl:value-of select="count($content/node()[name() != 'td'])" />
        </xsl:variable>

        <xsl:if test="$cellCount > 0 or $elementCount > 0">
            <xsl:copy>                  
                <xsl:apply-templates select="@*" />
                <xsl:copy-of select="$content" />
            </xsl:copy>
        </xsl:if>

    </xsl:template>

</xsl:stylesheet>

Well, actually the last if should be like this:

<xsl:choose>
    <!-- if there are cells then copy the content -->
    <xsl:when test="$cellCount > 0">
        <xsl:copy>
            <xsl:apply-templates select="@*" />
            <xsl:copy-of select="$content" />
        </xsl:copy>
    </xsl:when>

    <!-- if there are only other elements copy them -->
    <xsl:when test="$elementCount > 0">
        <xsl:copy>
            <xsl:apply-templates select="@*" />
            <xsl:copy-of select="$content/node()[name() != 'td']" />
        </xsl:copy>
    </xsl:when>
</xsl:choose>

That is because of the situation when <tr> contains empty elements <td> and another elements. Then you want to delete the <td>s and leave only the rest.

2 Comments

Thanks, good enough, the only thing to add is to use msxls:node-set($content) when you verify the number of nodes.
I tested it in XMLSpy and there count($content/node()) works just fine.
1

You could also filter out any table that only contains <tr> with empty <td>, and any <tr> with only empty <tr> (in addition to your other filters), using something like this (not tested):

<xsl:template match="tr[not(td/node())]" /> 
<xsl:template match="table[not(tr/td/node())]" /> 

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.