How to extract attribute value from a xml file

Question

I have an xml file like this

<?xml version="1.0" encoding="UTF-8"?>
<xfdf xmlns="http://ns.adobe.com/xfdf/" xml:space="preserve">
<f href="C:\cFGCACHE-058cef2b85c09427e606b143bd75248e252d004e\alternative.pdf"/>
<ids modified="BF43C70442ECB74FA49833BBA44D4679" original="B4870CC046121A41B7D8F0838C87256D"/>
<fields>
<field name="FormInstanceID">
<value>SRSQSC88E48-1-1.320</value>
</field>
<field name="txt_bestelltKW">
<value></value>
</field>
</fields>
</xfdf>

Now I need to extract the value of the f href attribute. I tried it with single line processing but there is certainly a better way to do it. Any idea?

Thanks

Besides the typo you have in the XML, this appears to be valid and well-formed. You can just use any XML parser. There are a bunch of those on CPAN. — simbabque
– simbabque, Commented Oct 13, 2014 at 15:10

choroba · Accepted Answer · 2014-10-13 15:13:08Z

4

After fixing the typo in your XML, I was able to extract the value with the following code:

#!/usr/bin/perl
use warnings;
use strict;

use XML::LibXML;

my $dom = 'XML::LibXML'->load_xml( file => 'example.xml' );
my $xc = 'XML::LibXML::XPathContext'->new;
$xc->registerNs('x', 'http://ns.adobe.com/xfdf/');

for my $href ($xc->findvalue('//x:f/@href', $dom)) {
    print $href, "\n";
}

I usually find XML::LibXML too verbose, so I'd use XML::XSH2:

open example.xml ;
register-namespace x http://ns.adobe.com/xfdf/ ;
for //x:f echo @href ;

answered Oct 13, 2014 at 15:13

choroba

245k27 gold badges221 silver badges304 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Miller · Accepted Answer · 2014-10-13 20:56:14Z

I like XML::Twig. Not to dispute previous poster's solution, I'd do it like this:

use strict;
use warnings;

use XML::Twig;

sub extract_f {
    my ( $twig, $f ) = @_;
    print $f->atts->{'href'}, "\n";
}

my $twig = XML::Twig->new( twig_handlers => { 'f' => \&extract_f }, );

$twig->parse( \*DATA );

__DATA__
<?xml version="1.0" encoding="UTF-8"?><xfdf xmlns="http://ns.adobe.com/xfdf/" xml:space="preserve" >
<f href="C:\cFGCACHE-058cef2b85c09427e606b143bd75248e252d004e\alternative.pdf"/>
<ids modified="BF43C70442ECB74FA49833BBA44D4679" original="B4870CC046121A41B7D8F0838C87256D"/>
<fields>
<field name="FormInstanceID">
<value>SRSQSC88E48-1-1.320</value>
</field>
<field name="txt_bestelltKW">
<value></value>
</field>
</fields>
</xfdf>

The major reason I like XML::Twig is because it allows purging XML as you go - so if you have a lot of XML to work with, it can be invaluable.

Miller · Accepted Answer · 2014-10-13 18:06:10Z

I would recommend either XML::LibXML or XML::Twig.

I would consider your goal rather trivial if not for having to deal with namespaces. However, the following demonstrates how to use XML::LibXML to pull your desired value while ignoring the namespaces:

use strict;
use warnings;

use XML::LibXML;

my $dom = XML::LibXML->load_xml( IO => \*DATA );

my ($f) = $dom->findnodes('//*[local-name()="f"]');

print $f->getAttribute('href'), "\n";

__DATA__
<?xml version="1.0" encoding="UTF-8"?>
<xfdf xmlns="http://ns.adobe.com/xfdf/" xml:space="preserve">
<f href="C:\cFGCACHE-058cef2b85c09427e606b143bd75248e252d004e\alternative.pdf"/>
<ids modified="BF43C70442ECB74FA49833BBA44D4679" original="B4870CC046121A41B7D8F0838C87256D"/>
<fields>
<field name="FormInstanceID">
<value>SRSQSC88E48-1-1.320</value>
</field>
<field name="txt_bestelltKW">
<value></value>
</field>
</fields>
</xfdf>

Outputs:

C:\cFGCACHE-058cef2b85c09427e606b143bd75248e252d004e\alternative.pdf

Collectives™ on Stack Overflow

How to extract attribute value from a xml file

3 Answers 3

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related