3

I have an xml file like this

<?xml version="1.0" encoding="UTF-8"?>
<xfdf xmlns="http://ns.adobe.com/xfdf/" xml:space="preserve">
<f href="C:\cFGCACHE-058cef2b85c09427e606b143bd75248e252d004e\alternative.pdf"/>
<ids modified="BF43C70442ECB74FA49833BBA44D4679" original="B4870CC046121A41B7D8F0838C87256D"/>
<fields>
<field name="FormInstanceID">
<value>SRSQSC88E48-1-1.320</value>
</field>
<field name="txt_bestelltKW">
<value></value>
</field>
</fields>
</xfdf>

Now I need to extract the value of the f href attribute. I tried it with single line processing but there is certainly a better way to do it. Any idea?

Thanks

1
  • Besides the typo you have in the XML, this appears to be valid and well-formed. You can just use any XML parser. There are a bunch of those on CPAN. Commented Oct 13, 2014 at 15:10

3 Answers 3

4

After fixing the typo in your XML, I was able to extract the value with the following code:

#!/usr/bin/perl
use warnings;
use strict;

use XML::LibXML;

my $dom = 'XML::LibXML'->load_xml( file => 'example.xml' );
my $xc = 'XML::LibXML::XPathContext'->new;
$xc->registerNs('x', 'http://ns.adobe.com/xfdf/');

for my $href ($xc->findvalue('//x:f/@href', $dom)) {
    print $href, "\n";
}

I usually find XML::LibXML too verbose, so I'd use XML::XSH2:

open example.xml ;
register-namespace x http://ns.adobe.com/xfdf/ ;
for //x:f echo @href ;
Sign up to request clarification or add additional context in comments.

Comments

2

I like XML::Twig. Not to dispute previous poster's solution, I'd do it like this:

use strict;
use warnings;

use XML::Twig;

sub extract_f {
    my ( $twig, $f ) = @_;
    print $f->atts->{'href'}, "\n";
}

my $twig = XML::Twig->new( twig_handlers => { 'f' => \&extract_f }, );

$twig->parse( \*DATA );

__DATA__
<?xml version="1.0" encoding="UTF-8"?><xfdf xmlns="http://ns.adobe.com/xfdf/" xml:space="preserve" >
<f href="C:\cFGCACHE-058cef2b85c09427e606b143bd75248e252d004e\alternative.pdf"/>
<ids modified="BF43C70442ECB74FA49833BBA44D4679" original="B4870CC046121A41B7D8F0838C87256D"/>
<fields>
<field name="FormInstanceID">
<value>SRSQSC88E48-1-1.320</value>
</field>
<field name="txt_bestelltKW">
<value></value>
</field>
</fields>
</xfdf>

The major reason I like XML::Twig is because it allows purging XML as you go - so if you have a lot of XML to work with, it can be invaluable.

Comments

0

I would recommend either XML::LibXML or XML::Twig.

I would consider your goal rather trivial if not for having to deal with namespaces. However, the following demonstrates how to use XML::LibXML to pull your desired value while ignoring the namespaces:

use strict;
use warnings;

use XML::LibXML;

my $dom = XML::LibXML->load_xml( IO => \*DATA );

my ($f) = $dom->findnodes('//*[local-name()="f"]');

print $f->getAttribute('href'), "\n";

__DATA__
<?xml version="1.0" encoding="UTF-8"?>
<xfdf xmlns="http://ns.adobe.com/xfdf/" xml:space="preserve">
<f href="C:\cFGCACHE-058cef2b85c09427e606b143bd75248e252d004e\alternative.pdf"/>
<ids modified="BF43C70442ECB74FA49833BBA44D4679" original="B4870CC046121A41B7D8F0838C87256D"/>
<fields>
<field name="FormInstanceID">
<value>SRSQSC88E48-1-1.320</value>
</field>
<field name="txt_bestelltKW">
<value></value>
</field>
</fields>
</xfdf>

Outputs:

C:\cFGCACHE-058cef2b85c09427e606b143bd75248e252d004e\alternative.pdf

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.