0

can some one please help me on getting xml data into shell scripting

here is my requirement.

I need to print CHILD value along with attribute value of CHILD and parent if the CHILD value is greater than 100

here is my data

<mydata>
    <parent detail="school1">
        <CHILD attribute="0">0</CHILD>
        <CHILD attribute="1">1932</CHILD>
        <CHILD attribute="2">0</CHILD>
        <CHILD attribute="3">500</CHILD>
        <CHILD attribute="4">0</CHILD>
        <CHILD attribute="5">0</CHILD>
        <CHILD attribute="6">7819</CHILD>
        <CHILD attribute="7">0</CHILD>
        <CHILD attribute="8">299</CHILD>
        <CHILD attribute="9">0</CHILD>
    </parent>
    <parent detail="school2">
        <CHILD attribute="0">1</CHILD>
        <CHILD attribute="1">7000</CHILD>
        <CHILD attribute="2">0</CHILD>
        <CHILD attribute="3">0</CHILD>
        <CHILD attribute="4">600</CHILD>
        <CHILD attribute="5">0</CHILD>
        <CHILD attribute="6">11674</CHILD>
        <CHILD attribute="7">0</CHILD>
        <CHILD attribute="8">489</CHILD>
        <CHILD attribute="9">0</CHILD>
    </parent>
</mydata>

my external file values are like this childvalue_limits.txt file

attribute0=100
attribute1=60
attribute3=80
attribute4=90
attribute5=100
attribute6=90
attribute7=50
attribute8=80
attribute9=70

I need to pass this file as argument to script and to take these values dynamically into the condition..

current code

sed 's|><|>\n<|g' $WORKING_PATH/xml_detail.log | awk -F'"|<|>' '/parent detail/{p=$3} /CHILD attribute/{att=$3;val=$5;if(val>100)print  "child value on " p, "attribute "att,"is at value: "val ,"\n"}' 

current output

child value on school2 attribute 1 is at value 1000
child value on school2 attribute 4 is at value 600
.....
.....

required output should be like this

child value on school2 attribute 1 is at value 1000 and threshold is 60
child value on school2 attribute 4 is at value 600 and threshold is 90
.....
.....

please note: threshold value is the dynamic value passed to if condition through a separate file called childvalue_limits.txt

3
  • Your question is ambiguous. Do you mean you need the child value and attribute and parent for all children whose value exceeeds 100. or do you mean you need child value and attribute, and where the child value exceeeds 100 you need the parent as well? What have you tried so far? Commented Jul 18, 2014 at 14:45
  • Show an example output along with what you have done so far. Commented Jul 18, 2014 at 17:13
  • If you want to parse XML, use an XML parser (which of course can be run within a shell script). Using awk or any other regular expression based program will use a regular grammar, whereas XML is context-free and can therefore by definition not be correctly parsed by regex. Commented Jul 21, 2014 at 12:50

2 Answers 2

1

You can not (correctly) parse XML using regular expression. XML is a context-free language, which is more expressive than a grammar based on regular expressions. See the Chomsky hierarchy for details. That is also the reason why you run into troubles with newlines when using regular expressions.

Hence, it is better (and easier and more stable) to use a proper XML parser. As I am most familiar with BaseX (full disclousure: I am also associated with the project) I will use it.

When using the zip version, you can simple run the file bin/basex. The following XPath 3.0 expression should give you the correct output, simply concatenating the different values:

for $c in /mydata/parent/CHILD[. > 100] return $c/parent::parent/@detail || " " || $c/@attribute || " " || $c/data() || "&#10;"

Assuming your xml file is named mydata.xml you can execute this XPath simply by issueing the following command (i.e. this can be done in your shell script):

basex -i mydata.xml -q 'for $c in /mydata/parent/CHILD[. > 100] return $c/parent::parent/@detail || " " || $c/@attribute || " " || $c/data() || "&#10;"'
Sign up to request clarification or add additional context in comments.

3 Comments

@MarkSetchell You are very welcome - Welcome to the magical and mysterious journey that is XPath/XQuery processing ;-) Yes, we do have a simply homebrew install; for all debian (or debian-derived) users it should also be in the central repository (although slightly outdated, if I remember correctly).
+1 Excellent - that works nicely, thank you. For any OSX Mac users out there, I installed basex very simply with brew install basex.
Fixed a typo... and re-commented.
0

EDITED AGAIN

Ok, I have changed the code to read a file of input limits. It looks complicated but it is is not - you can remove all the lines that have the word "DEBUG" in them if you want to. The # is the start of a comment.

#!/bin/bash

awk -F'"|<|>' '
   FNR==NR           {
                       split($0,f,"=");  # Split line on "=" sign into array f[]
                       gsub(/[[:alpha:]]/,"",f[1]); # Remove non-digits
                       limits[f[1]]=f[2]; # Save for comparison later
                       print "DEBUG: limits[",f[1],"]=",f[2];
                       next
                     }
   /parent detail/   {
                       p=$3
                       print "DEBUG: parent detail=",p;
                     }
   /CHILD attribute/ {
                       att=$3;val=$5;
                       print "DEBUG: att=",att,",val=",val; 
                       if(val>limits[att])print p,att,val,limits[att]
                     }
   ' limits.txt xml

You will see at the end of the script that it reads in BOTH your files - limits.txt and xml. In the script, the block in curly braces that starts FNR==NR means that the following code only applies to reading and parsing limits.txt.

If you want to see the output without DEBUG messages, just run

./script | grep -v DEBUG

EDITED

Your code works fine for me with your revised data. Here is my output:

node2 1 1932
node2 6 7819
node1 1 1924
node1 6 11674

I assume you mean you want to avoid XML parsers and just use standard tools like awk and sed to achieve this, so I'll go with awk

awk -F'"|<|>' '/parent detail/{p=$3} /CHILD attribute/{att=$3;val=$5;if(val>100)print p,att,val}' xml

Output:

school1 1 1932
school1 3 500
school1 6 7819
school1 8 299
school2 1 7000
school2 4 600
school2 6 11674
school2 8 489

So, it sets the separator to any of ", < or >. Then, when it sees lines with the words "parent detail" it saves the value in p. When it sees lines with the words CHILD attribute it extracts the attribute and value. If the value is over 100, it prints the parent, attribute and value.

It assumes your XML is in a file called xml.

23 Comments

thanks Mark but the above code not working while changing values in the xml data. awk -F'"|<|>' '/ALLQUEUEDEPTHS server/{p=$3} /QUEUE_DEPTH queue/{att=$3;val=$5;if(val>100)print p,att,val}' ./myfile.xml
Can you click edit underneath your question and paste in an XML file that my code doesn't work for please?
it doesn't allow me to paste xml file is there any way to send xml data to you ?
Put it in the same way as you put the original data in.
I have updated my answer - are you using GNU awk, or can you try using it - installed as gawk maybe?
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.