0

How do I parse a XML with below contents?

<?xml version="1.0"?>
<saw:ibot xmlns:saw="com.siebel.analytics.web/report/v1" version="1" priority="normal" jobID="36                                                                        ">
  <saw:schedule timeZoneId="(GMT-05:00) Eastern Time (US &amp; Canada)" disabled="false">
    <saw:start repeatMinuteInterval="60" endTime="23:59:00" startImmediately="true"/>
    <saw:recurrence runOnce="false">
      <saw:weekly weekInterval="1" mon="true" tue="true" wed="true" thu="true" fri="true"/>
    </saw:recurrence>
  </saw:schedule>
  <saw:dataVisibility type="recipient" runAs="cgm"/>
  <saw:choose>
    <saw:when condition="true">
      <saw:deliveryContent>
        <saw:headline>
          <saw:caption>
            <saw:text>Availability Parity Alert for Next 14 Days (@{NQ_SESSION.LBL_Next_14_Arriv                                                                        al_Days})</saw:text>
          </saw:caption>
        </saw:headline>
        <saw:conditionalReport/>
      </saw:deliveryContent>
      <saw:postActions/>
    </saw:when>
...skipping...
al_Days})</saw:text>
          </saw:caption>
        </saw:headline>
        <saw:conditionalReport/>
      </saw:deliveryContent>
      <saw:postActions/>
    </saw:when>
    <saw:otherwise/>
  </saw:choose>
  <saw:deliveryDestinations>
    <saw:destination category="dashboard"/>
    <saw:destination category="activeDeliveryProfile"/>
  </saw:deliveryDestinations>
  <saw:recipients subscribers="true" customize="false" specificRecipients="false">
    <saw:subscribers>
      <saw:user name="[email protected]"/>
      <saw:user name="[email protected]"/>
      <saw:user name="[email protected]"/>
    </saw:subscribers>
  </saw:recipients>
  <saw:conditionQuery>
    <saw:reportRefNode path="/shared/Quote/Product/Alerts/Daily Availability Parity Alert - Next                                                                         14 Days - Content"/>
  </saw:conditionQuery>
</saw:ibot>

and retrieve below output?

[email protected]
[email protected]
[email protected]

Also I have 5 .xml file with different set of parsing name value. Anyway we can parse and merge them in command line and output in one file ?

I have tried sed and awk options but not helping me much to get desired output.

7
  • 5
    1. Don't parse XML with sed or awk. 2. We can't provide you examples of code to run without seeing the XML that contains the data you want to retrieve. 3. Don't parse XML with sed or awk. 4. Please update your question to provide a minimal example XML file. 5. Don't parse XML with sed or awk. Commented Jul 17, 2015 at 22:38
  • I've formatted your question and the XML is now visible. Unfortunately your example is not a valid XML document. Commented Jul 17, 2015 at 22:40
  • You need to format the content. In this case that means using the {} marker to indent the content by four spaces. I'll do it for you once again... Commented Jul 17, 2015 at 23:02
  • That's still not a valid XML document: /tmp/xml:33.18: Opening and ending tag mismatch: subscribers line 29 and recipients and other errors Commented Jul 17, 2015 at 23:04
  • 2
    @G-Man I don't think it is a duplicate as this one is all about well formed XML document parsing, whereas your suggested duplicate needs different solutions due to the potential lack of well-formed-ness of html. I don't think it's off topic either fwiw. Commented Jul 18, 2015 at 6:57

2 Answers 2

4

This command will parse your XML document and use XPath to extract the name attribute values for the element at location /saw:ibot/saw:recipients/saw:subscribers/saw:user

xmlstarlet sel -t -v '/saw:ibot/saw:recipients/saw:subscribers/saw:user/@name' </tmp/xml

Output

[email protected]
[email protected]
[email protected]
Sign up to request clarification or add additional context in comments.

3 Comments

On a side note: people also seem to like xidel (site down for the moment, along with the rest of SourceForge).
OK, if you say so. For me it doesn't, and it is hard to understand how it can work.
@mzjn ah. The XML has changed shape. Again. When I answered the question my answer worked, but now it doesn't. If you follow the history through you'll see that as it was, it took several attempts to get the OP to provide a sample that was even valid XML, and none of those were well formatted for easy viewing. I'll update my answer once more.
1

Use an XML Parser. Personally - like XML::Twig and perl.

#!/usr/bin/env perl

use strict;
use warnings;
use XML::Twig;

my $twig = XML::Twig->new( );
$twig->parsefile ( 'your_file.xml' );

foreach my $saw_user ( $twig->get_xpath('//saw:user') ) {
    print $saw_user ->att('name'), "\n";
}

This prints:

[email protected]
[email protected]
[email protected]

If you want a 'one liner' then instead:

perl -MXML::Twig -0777 -e 'print map { $_ -> att('name')."\n"} ( XML::Twig->parse( <> )->get_xpath('//saw:user') )' your_xml_file

Please for the sake of future maintenance programmers and sysadmins - DO NOT use regular expressions to parse XML. Why you may ask? Well, because taking your XML as an example - it can look like any of these and still be semantically identical:

(your example +

<?xml version="1.0" encoding="utf-8"?>
<saw:ibot
    jobID="36"
    priority="normal"
    version="1"
    xmlns:saw="com.siebel.analytics.web/report/v1">
  <saw:schedule
      disabled="false"
      timeZoneId="(GMT-05:00) Eastern Time (US &amp; Canada)">
    <saw:start
        endTime="23:59:00"
        repeatMinuteInterval="60"
        startImmediately="true"
    />
    <saw:recurrence runOnce="false">
      <saw:weekly
          fri="true"
          mon="true"
          thu="true"
          tue="true"
          wed="true"
          weekInterval="1"
      />
    </saw:recurrence>
  </saw:schedule>
  <saw:dataVisibility
      runAs="cgm"
      type="recipient"
  />
  <saw:choose>
    <saw:when condition="true">
      <saw:deliveryContent>
        <saw:headline>
          <saw:caption>
            <saw:text>Availability Parity Alert for Next 14 Days (@{NQ_SESSION.LBL_Next_14_Arrival_Days})</saw:text>
          </saw:caption>
        </saw:headline>
        <saw:conditionalReport/>
      </saw:deliveryContent>
      <saw:postActions/>
    </saw:when>
    <saw:otherwise/>
  </saw:choose>
  <saw:deliveryDestinations>
    <saw:destination category="dashboard" />
    <saw:destination category="activeDeliveryProfile" />
  </saw:deliveryDestinations>
  <saw:recipients
      customize="false"
      specificRecipients="false"
      subscribers="true">
    <saw:subscribers>
      <saw:user name="[email protected]" />
      <saw:user name="[email protected]" />
      <saw:user name="[email protected]" />
    </saw:subscribers>
  </saw:recipients>
  <saw:conditionQuery>
    <saw:reportRefNode path="/shared/Quote/Product/Alerts/Daily Availability Parity Alert - Next 14 Days - Content" />
  </saw:conditionQuery>
</saw:ibot>

Or like this (note tag wrapping of elements)

<?xml version="1.0" encoding="utf-8"?>
<saw:ibot jobID="36" priority="normal" version="1" xmlns:saw="com.siebel.analytics.web/report/v1">
  <saw:schedule disabled="false" timeZoneId="(GMT-05:00) Eastern Time (US &amp; Canada)">
    <saw:start endTime="23:59:00" repeatMinuteInterval="60" startImmediately="true"/>
    <saw:recurrence runOnce="false">
      <saw:weekly fri="true" mon="true" thu="true" tue="true" wed="true" weekInterval="1"/>
    </saw:recurrence>
  </saw:schedule>
  <saw:dataVisibility runAs="cgm" type="recipient"/>
  <saw:choose>
    <saw:when condition="true">
      <saw:deliveryContent>
        <saw:headline>
          <saw:caption>
            <saw:text>Availability Parity Alert for Next 14 Days (@{NQ_SESSION.LBL_Next_14_Arrival_Days})</saw:text>
          </saw:caption>
        </saw:headline>
        <saw:conditionalReport/>
      </saw:deliveryContent>
      <saw:postActions/>
    </saw:when>
    <saw:otherwise/>
  </saw:choose>
  <saw:deliveryDestinations>
    <saw:destination category="dashboard"/>
    <saw:destination category="activeDeliveryProfile"/>
  </saw:deliveryDestinations>
  <saw:recipients customize="false" specificRecipients="false" subscribers="true">
    <saw:subscribers>
      <saw:user name="[email protected]"/>
      <saw:user name="[email protected]"/>
      <saw:user name="[email protected]"/>
    </saw:subscribers>
  </saw:recipients>
  <saw:conditionQuery>
    <saw:reportRefNode path="/shared/Quote/Product/Alerts/Daily Availability Parity Alert - Next 14 Days - Content"/>
  </saw:conditionQuery>
</saw:ibot>

Or like this:

<?xml version="1.0" encoding="utf-8"?>
<saw:ibot
jobID="36"
priority="normal"
version="1"
xmlns:saw="com.siebel.analytics.web/report/v1"
><saw:schedule
disabled="false"
timeZoneId="(GMT-05:00) Eastern Time (US &amp; Canada)"
><saw:start
endTime="23:59:00"
repeatMinuteInterval="60"
startImmediately="true"
/><saw:recurrence
runOnce="false"
><saw:weekly
fri="true"
mon="true"
thu="true"
tue="true"
wed="true"
weekInterval="1"
/></saw:recurrence></saw:schedule><saw:dataVisibility
runAs="cgm"
type="recipient"
/><saw:choose
><saw:when
condition="true"
><saw:deliveryContent
><saw:headline
><saw:caption
><saw:text
>Availability Parity Alert for Next 14 Days (@{NQ_SESSION.LBL_Next_14_Arrival_Days})</saw:text></saw:caption></saw:headline><saw:conditionalReport
/></saw:deliveryContent><saw:postActions
/></saw:when><saw:otherwise
/></saw:choose><saw:deliveryDestinations
><saw:destination
category="dashboard"
/><saw:destination
category="activeDeliveryProfile"
/></saw:deliveryDestinations><saw:recipients
customize="false"
specificRecipients="false"
subscribers="true"
><saw:subscribers
><saw:user
name="[email protected]"
/><saw:user
name="[email protected]"
/><saw:user
name="[email protected]"
/></saw:subscribers></saw:recipients><saw:conditionQuery
><saw:reportRefNode
path="/shared/Quote/Product/Alerts/Daily Availability Parity Alert - Next 14 Days - Content"
/></saw:conditionQuery></saw:ibot>

Hopefully by looking at these samples, you'll see that by reformatting your XML in a PERFECTLY VALID fashion, your regex might one day break mysteriously.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.