3

Say I have this XML and I need to remove empty elements (elements that don't contain data at all) such as:

...
<date>
    <!-- keep oneDay -->
    <oneDay>
        <startDate>1450288800000</startDate>
        <endDate>1449086400000</endDate>
    </oneDay>
    <!-- remove range entirely -->
    <range>
        <startDate/>
        <endDate/>
    </range>
    <!-- remove deadline entirely -->
    <deadline>
        <date/>
    </deadline>
<data>
...

The output then should be

...
<oneDay>
    <startDate>1450288800000</startDate>
    <endDate>1449086400000</endDate>
</oneDay>
...

I'm looking for a dynamic solution that would work on any cases like this regardless of the literal name of the element.

SOLUTION (UPDATED)

It turns out that using //*[not(normalize-space())] returns all elements without non-empty text content (no need for recursion).

foreach($xpath->query('//*[not(normalize-space())]') as $node ) {
    $node->parentNode->removeChild($node);
} 

Check out @har07's solution for more details

SOLUTION

The xPath approach provided by @manuelbc works but only on child elements (meaning that the children will be gone but the parent nodes of those will stay... empty as well).

However, this will work recursively until the XML document is out of empty nodes.

$doc = new DOMDocument;
$doc->preserveWhiteSpace = false;
$doc->loadxml('<XML STRING GOES HERE>');

$xpath = new DOMXPath($doc);

while (($notNodes = $xpath->query('//*[not(node())]')) && ($notNodes->length)) {
  foreach($notNodes as $node) {
    $node->parentNode->removeChild($node);
  }
}

$doc->formatOutput = true;
echo $doc->saveXML();

2 Answers 2

1

The XPath in the other answer only returns empty elements in the sense that the element has no child node of any kind (no element node, no text node, nothing). To get all empty elements according to your definition, that is element without non-empty text content, try using the following XPath instead :

//*[not(normalize-space())]

eval.in demo

output :

<?xml version="1.0"?>
<data>
  <!-- keep oneDay -->
  <oneDay>
    <startDate>1450288800000</startDate>
    <endDate>1449086400000</endDate>
  </oneDay>
  <!-- remove range entirely -->
  <!-- remove deadline entirely -->
</data>
Sign up to request clarification or add additional context in comments.

1 Comment

Really nice and simple take on this one and it works perfectly in my case.
1

You can do it with XPath

<?php
$doc = new DOMDocument;
$doc->preserveWhiteSpace = false;
$doc->loadxml('<date>
    <!-- keep oneDay -->
    <oneDay>
        <startDate>1450288800000</startDate>
        <endDate>1449086400000</endDate>
    </oneDay>
    <!-- remove range entirely -->
    <range>
        <startDate/>
        <endDate/>
    </range>
    <!-- remove deadline entirely -->
    <deadline>
        <date/>
    </deadline>
<data>');

$xpath = new DOMXPath($doc);

foreach( $xpath->query('//*[not(node())]') as $node ) {
    $node->parentNode->removeChild($node);
}

$doc->formatOutput = true;
echo $doc->savexml();

See original solution here: Remove empty tags from a XML with PHP

3 Comments

Thansk! This won't actually get rid of all the empty elements. Instead it will only remove the the empty children. Meaning <rage/> and <deadline/> will stay but their children will be gone.
However, I used your suggestion to write a recursive function to do this and it works! I will share it in the post shortly. Feel free to make any changes.
Xpath and working on xml hierarchy is a pain. good approach would also consist in converting your xml into json. Get ride of whatever you want, and convert it back in xml. That what I do.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.