PHP DomXPath not selecting empty text nodes

Question

I'm trying to select nodes which don't contain any text. This bit of php code skips the empty node in the sample xml. However, when I try an online tester (like http://freeformatter.com/xpath-tester.html) it doesn't have any problem.

Is this a PHP thing?

My php code:

    $path = "//RecipeSteps/RecipeStep[not(text())]";
    $stepsQuery = $this->xpath->query($path);
    $numResults = $stepsQuery->length;

My sample xml:

<?xml version="1.0" encoding="utf-8"?>
<Recipes>
    <RecipeSteps>
      <RecipeStep number="1">Dummy content</RecipeStep>
      <RecipeStep number="2">Dummy content</RecipeStep>
      <RecipeStep number="3">Dummy content</RecipeStep>
      <RecipeStep number="4">Dummy content</RecipeStep>
      <RecipeStep number="5">Dummy content</RecipeStep>
      <RecipeStep number="6"></RecipeStep>
      <RecipeStep number="7">Variations</RecipeStep>
      <RecipeStep number="8">Some variation content..</RecipeStep>
    </RecipeSteps>
</Recipes>

i do not know why that is not working, but this should work: $emptyRecipeSteps=call_user_func(function() use (&$DOMDocument){$ret=array();foreach($DOMDocument->getElementsByTagName("RecipeStep") as &$recipeStep){if(empty($recipeStep->textContent){$ret[]=$recipeStep;}}return $ret;}); — hanshenrik
– hanshenrik, Commented Mar 24, 2015 at 19:31
PHP and the online example you cite give exactly the same result: 3v4l.org/bKuOd - So what did you bother about? — hakre
– hakre, Commented Mar 25, 2015 at 17:50

MadsBjaerge · Accepted Answer · 2015-03-24 19:33:39Z

1

If you are looking for a XPATH solution, use //RecipeSteps/(RecipeStep[string-length() = 0]). e.g

$path = "//RecipeSteps/(RecipeStep[string-length() = 0])";
$stepsQuery = $this->xpath->query($path);
$numResults = $stepsQuery->length;

answered Mar 24, 2015 at 19:33

MadsBjaerge

1265 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

MadsBjaerge Over a year ago

It's not a function in that way, it's purpose is to select all text-nodes of the context node. w3.org/TR/xpath/#location-paths. The //not(contains[text()]) will return true for all nodes that are not text nodes, but will not return the nodes. A good resource for you is videlibri.sourceforge.net/cgi-bin/xidelcgi where you can test xpath queries against xml/html

Mathias Müller Over a year ago

@doub1ejack Not really, both return the same in this case. See my answer. And do not test on videlibri.sourceforge.net/cgi-bin/xidelcgi - it is an XPath 2.0 engine, whereas DomXPath only supports XPath 1.0.

MadsBjaerge Over a year ago

@MathiasMüller seems to be right. The previous test worked. And he's also right about the Xpath 2.0 point. I answered from a xpath perspective.I am however not sure why you say that my answer is incorrect, i believe the question was that there should be no string, and the test i present tests just that?

Mathias Müller Over a year ago

@MadsBjaerge Please read carefully; I did not say your answer is incorrect (however, your comment is incorrect: //not(contains[text()]) is not a valid expression). Your answer is correct - I am saying that it does not solve the problem, because the path expression was not the problem to begin with (except the input document of the OP is actually different).

felipsmartins · Accepted Answer · 2015-03-24 20:02:11Z

When selecting full path it work:

$xmlString = '<?xml version="1.0" encoding="utf-8"?>
<Recipes>
    <RecipeSteps>
      <RecipeStep number="1">Dummy content</RecipeStep>
      <RecipeStep number="2">Dummy content</RecipeStep>
      <RecipeStep number="3">Dummy content</RecipeStep>
      <RecipeStep number="4">Dummy content</RecipeStep>
      <RecipeStep number="5">Dummy content</RecipeStep>
      <RecipeStep number="6"></RecipeStep>
      <RecipeStep number="7">Variations</RecipeStep>
      <RecipeStep number="8">Some variation content..</RecipeStep>
    </RecipeSteps>
</Recipes>';

$dom = new DOMDocument();
$dom->loadXML($xmlString);
$xpath = new DOMXpath($dom);
# it works also well: //RecipeSteps/RecipeStep[not(text())]
$query = $xpath->query('//Recipes/RecipeSteps/RecipeStep[not(text())]');
//returns "6"
print 'RecipeStep number: ' . $query->item(0)->getAttribute('number');

Also, selecting "//RecipeSteps/RecipeStep[not(text())]" works like a charm also well. So most likely you're doing something wrong.

Mathias Müller · Accepted Answer · 2015-03-24 21:03:34Z

0

The path expressions //RecipeStep[not(text())] and //RecipeStep[string-length() = 0] do not mean the same, but taking as input the document you have shown, they return exactly the same. In both cases, one RecipeStep node is selected as the result:

<RecipeStep number="6"/>

//RecipeStep[not(text())] means, in plain English:

Select element nodes called RecipeStep anywhere in the document, but only if they do not have any immediate child text nodes.

On the other hand, //RecipeStep[string-length() = 0] means

Select element nodes called RecipeStep anywhere in the document, but only if the length of their string value (the concatenation of all descendant text nodes) is equal to 0.

The difference would only be apparent if recipe step number 6 actually looked like

<RecipeStep number="6"><child>text</child></RecipeStep>

Then, //RecipeStep[not(text())] would still select this node, whereas //RecipeStep[string-length() = 0] would not return anything.

(And just to make it clear: the leading //RecipeSteps that I have omitted does not change anything.)

So, your original XPath expression is correct - and the accepted answer does exactly the same as your original one. XPath ist not at fault here.

edited Mar 24, 2015 at 21:03

answered Mar 24, 2015 at 20:45

Mathias Müller

22.7k13 gold badges62 silver badges78 bronze badges

4 Comments

MadsBjaerge Over a year ago

I believe that in the case of <RecipeStep number="6"> <child>text</child> </RecipeStep> it would be incorrect according to his post to select that?

Mathias Müller Over a year ago

@MadsBjaerge I'm not sure that's what you mean but there was indeed an error in my post, any space between <RecipeStep> and <child> would result in a whitespace-only text node that is a child of RecipeStep.

MadsBjaerge Over a year ago

And do you agree that if the situation you refer to with <child> mine would hold true, while your's doesnt?

Mathias Müller Over a year ago

@MadsBjaerge There are no "true" path expressions - it always depends on what you would like to get as the result. Or, put another way: we do not know if selecting <RecipeStep number="6"><child>text</child></RecipeStep> as the result is the intended behaviour or not - the OP simply was not that clear with explaining their requirements.

Collectives™ on Stack Overflow

PHP DomXPath not selecting empty text nodes

3 Answers 3

4 Comments

Comments

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

4 Comments

Comments

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related