6

I'm attempting to select a node using an XPath query and I don't understand why XML::LibXML doesn't find the node when it has an xmlns atribute. Here's a script to demonstrate the issue:

#!/usr/bin/perl

use XML::LibXML; # 1.70 on libxml2 from libxml2-dev 2.6.16-7sarge1 (don't ask)
use XML::XPath;  # 1.13
use strict;
use warnings;

use v5.8.4; # don't ask

my ($xpath, $libxml, $use_namespace) = @ARGV;

my $xml = sprintf(<<'END_XML', ($use_namespace ? 'xmlns="http://www.w3.org/2000/xmlns/"' : q{}));
<?xml version="1.0" encoding="iso-8859-1"?>
<RootElement>
  <MyContainer %s>
    <MyField>
        <Name>ID</Name>
        <Value>12345</Value>
    </MyField>
    <MyField>
        <Name>Name</Name>
        <Value>Ben</Value>
    </MyField>
  </MyContainer>
</RootElement>
END_XML

my $xml_parser
    = $libxml ? XML::LibXML->load_xml(string => $xml, keep_blanks => 1)
    :           XML::XPath->new(xml => $xml);

my $nodecount = 0;
foreach my $node ($xml_parser->findnodes($xpath)) {
    $nodecount ++;
    print "--NODE $nodecount--\n"; #would use say on newer perl
    print $node->toString($libxml && 1), "\n";
}

unless ($nodecount) {
    print "NO NODES FOUND\n";
}

This script allows you to chose between the XML::LibXML parser and the XML::XPath parser. It also allows you to define an xmlns attribute on the MyContainer element or leave it off depending on the arguments passed.

The xpath expression I'm using is "RootElement/MyContainer". When I run the query using the XML::LibXML parser without the namespace it finds the node with no problem:

benb@enkidu:~$ ROC/ECG/libxml_xpath.pl 'RootElement/MyContainer' libxml
--NODE 1--
<MyContainer>
    <MyField>
        <Name>ID</Name>
        <Value>12345</Value>
    </MyField>
    <MyField>
        <Name>Name</Name>
        <Value>Ben</Value>
    </MyField>
  </MyContainer>

However, when I run it with the namespace in place it finds no nodes:

benb@enkidu:~$ ROC/ECG/libxml_xpath.pl 'RootElement/MyContainer' libxml use_namespace
NO NODES FOUND

Contrast this with the output when using the XMLL::XPath parser:

benb@enkidu:~$ ROC/ECG/libxml_xpath.pl 'RootElement/MyContainer' 0 # no namespace
--NODE 1--
<MyContainer>
    <MyField>
        <Name>ID</Name>
        <Value>12345</Value>
    </MyField>
    <MyField>
        <Name>Name</Name>
        <Value>Ben</Value>
    </MyField>
  </MyContainer>
benb@enkidu:~$ ROC/ECG/libxml_xpath.pl 'RootElement/MyContainer' 0 1 # with namespace
--NODE 1--
<MyContainer xmlns="http://www.w3.org/2000/xmlns/">
    <MyField>
        <Name>ID</Name>
        <Value>12345</Value>
    </MyField>
    <MyField>
        <Name>Name</Name>
        <Value>Ben</Value>
    </MyField>
  </MyContainer>

Which of these parser implementations is doing it "right"? Why does XML::LibXML treat it differently when I use a namespace? What can I do to retrieve the node when the namespace is in place?

2
  • Good question, +1. See my answer for explanation and for two possible solutions. Commented Nov 3, 2010 at 3:07
  • @ikegami, So has to be useful both to advanced and novice users. They shouldn't be discouraged to ask questions. Commented Mar 27, 2014 at 14:32

3 Answers 3

14

This is a FAQ. XPath considers any unprefixed name in an expression to belong to "no namespace".

Then, the expression:

RootElement/MyContainer

selects all MyContainer elements that belong to "no namespace" and are children of all RootElement elements that belong to "no namespace" and are children of the context (current node). However, there are no elements at all in the whole document that belong to "no namespace" -- all elements belong to the default namespace.

This explains the result you are getting. XML::LibXML is right.

The common solution is that the API of the hosting language allows a specific prefix to be bound to the namespace by "registering" a namespace. Then one can use an expression like:

x:RootElement/x:MyContainer

where x is the prefix with which the namespace has been registered.

In the very rare occasions where the hosting language doesn't offer registering namespaces, use the following expression:

*[name()='RootElement']/*[name()='MyContainer']
Sign up to request clarification or add additional context in comments.

5 Comments

With XML::LibXML, you register namespaces using XML::LibXML::XPathContext. This is documented in findnodes.
@ikegami, One shouldn't know how all possible XPath hosts implement registering of namespace prefixes. The correct answer to this general and re-occurring question (if we want the answer to serve not only users of a particular XPath implementation) should explain what is happening and allow the users to look in their particular documentation for the implementation-defined details.
That may be, but the OP asked about how to do it in XML::LibXML, so why are you taking offense at me telling him the little bit you missed from your answer?
@ikegami, No offense taken. Just explaining to you the rationale for providing important generally-useful answer vs. a limited and more precise answer, directed only to the original asker. In case your original comment was targeting the asker, you could comment his question -- not this particular answer, where he may not see it.
I never suggested you should replace your answer with anything limited. It's directed to the OP, but it's a missing bit of your answer, which is is why it's posted as a reply to you answer. (Already mentioned this...)
7

@Dmitre is right. You need to take a look at XML::LibXML::XPathContext which will allow you to declare the namespace and then you can use namespace aware XPath statements. I gave an example of using this some time ago on stackoverflow - have a look at Why should I use XPathContext with Perl's XML::LibXML

1 Comment

Thanks for the pointer to the XPathContext question. I suspected it could help me and attempted to use it without knowing what I was doing without any success. I'll see if the examples there will help.
1

Using XML::LibXML 1.69.

Maybe this a XML::LibXML 1.69 thing but the strange part is that I can use the normal XPath and findnodes() and the code below prints the nodes.

use strict;
use XML::LibXML;

my $xml = <<END_XML;
<?xml version="1.0" encoding="iso-8859-1"?>
<RootElement>
   <MyContainer xmlns="http://www.w3.org/2000/xmlns/">
    <MyField>
        <Name>ID</Name>
        <Value>12345</Value>
    </MyField>
    <MyField>
        <Name>Name</Name>
        <Value>Ben</Value>
    </MyField>
  </MyContainer>
</RootElement>
END_XML

my $parser = XML::LibXML->new();

$parser->recover_silently(1);

my $doc = $parser->parse_string($xml);

my $root = $doc->documentElement();

foreach my $node ($root->findnodes('MyContainer/MyField')) {
     print $node->toString();
}

But if I change the namespace to something other than "http://www.w3.org/2000/xmlns/", then using XML::LibXML::XPathContext is required to get the same nodes to print.

use strict;
use XML::LibXML;

my $xml = <<END_XML;
<?xml version="1.0" encoding="iso-8859-1"?>
<RootElement>
  <MyContainer xmlns="http://something.org/2000/something/">
    <MyField>
        <Name>ID</Name>
        <Value>12345</Value>
    </MyField>
    <MyField>
        <Name>Name</Name>
        <Value>Ben</Value>
    </MyField>
  </MyContainer>
</RootElement>
END_XML

my $parser = XML::LibXML->new();

$parser->recover_silently(1);

my $doc = $parser->parse_string($xml);

my $root = $doc->documentElement();

my $xpc = XML::LibXML::XPathContext->new($root);

$xpc->registerNs("x", "http://something.org/2000/something/");

foreach my $node ($xpc->findnodes('x:MyContainer/x:MyField')) {
    print $node->toString();
}

1 Comment

Remove the line $parser->recover_silently(1); in the first example and you'll get the error message namespace error : reuse of the xmlns namespace name is forbidden. If you use the recover option, the namespace declaration will simply be ignored. If you use recover_silently not even an error message will be printed. That's why it's usually a bad idea.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.