1

I have this XML file -

<gp>
<mms>1110012</mms>
<tg>988</tg>
<mm>LongTime</mm>
<lv>
    <lkid>StartEle=ONE, Desti = Motion</lkid>
    <kk>12</kk>
</lv>
<lv>
    <lkid>StartEle=ONE, Source = Velocity</lkid>
    <kk>2</kk>
</lv>
<lv>
    <lkid>StartEle=ONE, Source = Park</lkid>
    <kk>2</kk>
</lv>
</gp>

<gp>
<mms>2221100</mms>
<tg>989</tg>
<mm>LongVelocity</mm>
<lv>
    <lkid>StartEle=ONE, Source = Velocity</lkid>
    <kk>772</kk>
</lv>
<lv>
    <lkid>StartEle=ONE, Desti = Motion</lkid>
    <kk>900</kk>
</lv>
<lv>
    <lkid>StartEle=ONE, Source = Park</lkid>
    <kk>2</kk>
</lv>
</gp>

Now, I need to first search for "LongTime", if found then I have to find for "Desti = Motion" value (which is inside StartEle=ONE, Desti = Motion) inside the multiple nested sub-tags ... and if that is also found then I finally have to get the value inside the TAG below, which is 12 (12).

Please help, using anything - AWK, SED, Grep, anything will do.

Thanks in advance.

2
  • 1
    Try looking at this answer: stackoverflow.com/questions/4680143/… Commented Jan 9, 2014 at 12:06
  • 1
    When I parse XML stream, I prefer to use tools which are optimized to do it. There are many shell languages and command which support DOM approach, Xpath queries, .... like Perl (which is provided by a majority of Linux distribution), Python, PHP (there is a PHP interpretor which allow us to write some shell scripts in PHP), xmllint, etc etc Commented Jan 9, 2014 at 12:40

3 Answers 3

2

Using awk

awk -F"[<>]" '/LongTime/ {f=1} f && /Desti = Motion/ {getline;print $3;f=0}' file
12

This search for LongTime if found set flag f=1
If flag f is true and Desti = Motion is found, get next line and print value and reset flag f


To make sure it does not print other Desti = Motion if section LongTime does not contain Desti = Motion, you could reset the flag f if new section is not LongTime by adding /^<mm>/ && !/LongTime/ {f=0}:

awk -F"[<>]" '/LongTime/ {f=1} /^<mm>/ && !/LongTime/ {f=0} f && /Desti = Motion/ {getline;print $3;f=0}' file
12

To avoid using getline incase of extra blank lines use this:

awk -F"[<>]" '/LongTime/ {f=1} /^<mm>/ && !/LongTime/ {f=0} f && /Desti = Motion/ {q=1} f && q && /<kk>/ {print $3;f=q=0}' file
12

Just add an extra test.

Here is some more readable:

awk -F"[<>]" '
    /LongTime/              {f=1}
    /^<mm>/ && !/LongTime/  {f=0}
    f && /Desti = Motion/   {q=1} 
    f && q && /<kk>/        {print $3;f=q=0}
    ' file
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks, Jotne.But what if there is a new line between the upper and below tags, for example - <lkid>StartEle=ONE, Desti = Motion</lkid> ---- new line here --- <kk>12</kk> Well, I can add 2 "getline;" to resolve it as I know a new line is there, But this has to be dynamic as some other such tag might have interim spaces/ new lines, which is not know in advance. Please provide your inputs.
0
sed -n '\|<mm>LongTime</mm>|,\|</gp>| {
   \|Desti = Motion</lkid>|,\|</kk>| {
      /<kk>/ s|</\{0,1\}[^>]*>||gp
      }
   }' YourFile

this work on your sample XML but if it change (in format), specify wich kind of change you expect (case of new line is OK here) [use -posix for GNU sed]

Comments

0

In Gnu Awk version 4, you could try something like:

gawk -f a.awk file.xml

where a.awk is:

BEGIN {
    RS="^$"
    FPAT="(<mm>LongTime</mm>)|(<lkid>[^<]*</lkid>)|(<kk>[^<]*</kk>)"
}
{
    do {
        if ($(++i)=="<mm>LongTime</mm>") {
            do {
                if ($(++i)~/<lkid>.*Desti = Motion.*<\/lkid>/) {
                    match ($(i+1),/<kk>([^<]*)<\/kk>/,a)
                    print a[1]
                    exit
                }
            } while (i<=NF)
        }
    } while (i<=NF)
}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.