4

Does anybody know a Perl library that can parse XML documents and enables me to select nodes via CSS Selectors and namespace support?

Background: I was trying to parse a document with a default namespace with the perl libxml package but it never returned anything until I removed the default namespace from the root node.

This is what I found on the topic: https://mail.gnome.org/archives/xml/2003-April/msg00143.html

So a simple example would be a file like this:

<?xml version="1.0" encoding="utf-8"?>
<root xmlns="http://example.com/ns">
  <message>Hi</message>
</root

XPath //message wouldn't give me any results with perl libxml. I know that the library is doing it's job perfectly fine but I still need to parse that stuff, so I figured a CSS selector based library might be more successful.

2
  • Can you provide an example what you want to parse and what you want to get? Commented Jun 16, 2012 at 14:00
  • Pro tip: the graphical interface to libxml Xacobeo makes it easy to mess with XPath. Screenshot: i.sstatic.net/fOTus.png Commented Jun 16, 2012 at 15:09

3 Answers 3

1

This should work with anything you can throw at libxml.

use strictures;
use HTML::TreeBuilder::LibXML qw();
BEGIN { HTML::TreeBuilder::LibXML->replace_original; }
use Web::Query qw();

print Web::Query->new_from_html(<<'MARKUP')->find('root > message')->text;
<?xml version="1.0" encoding="utf-8"?>
<root xmlns="http://example.com/ns">
<message>Hi</message>
</root>
MARKUP

1;
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks for the hint to Web::Query. I hoped to find something more CPANish. But this will do nicely.
Indeed, so even the boss will be happy ;)
1

Try this one:

#!/usr/bin/perl

use XML::XPath;
use XML::XPath::XMLParser;

my $xp = XML::XPath->new(filename => 'test.xhtml');

print XML::XPath::XMLParser::as_string($_), "\n" for ($xp->find('root/message')->get_nodelist);

3 Comments

Thanks very much. Your code works on the example xml I provided as well as on my production data. I has almost the same code except that I used XML::Parser, which didn't find the message node.
The question was about CSS selectors.
@user1215106: I was indeed looking for a CSS selector library because I generally prefer it over XPath. This is why you get an upvote but no accept.
0

Unless you tell it too XML::Twig will happily ignore namespaces. You would get to message either by setting a handler on the element name, or by using an XPath query like my @messages= $twig->findnodes( '//message')

To use a handler you would write:

XML::Twig->new( twig_handlers => { message => \&process_message })
         ->parsefile( "my.xml"); )

sub process_message
  { my( $twig, $message)= @_;
    print $message->text;
  }

2 Comments

Can you rewrite this to use HTML-Selector-XPath so that we arrive at the node through a CSS selector?
no ;--) XML::Twig lets you use tag.class though, which I often find to be a useful shortcut

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.