How to extract multiple tag values from multiple xml files in linux

Question

We need to extract multiple tag values from multiple files.

We have around 1000 files with data similar to:

<Employee>
  <Id>432361</Id>
  <EmpName>Stuart</EmpName>
  <SidNumber>0251115</SidNumber>
  <CreatedUtc>2016-11-14T22:27:53.477+08:00</CreatedUtc>
  <EpisodeId>682082</EpisodeId>
  <CorrelationId>323A6C86-76AA-E611-80DA-005056B46023</CorrelationId>
</Employee>

we need to extract EmpName, SidNumber and EpisodeId from all the files to a single file. we are able to get one value at a time, for ex. using command:

nawk -F'[<>]' '/<EpisodeId>/{print $3}' *.dat

But we need to get multiple tags of each file. the output format should be something similar to

EmpName Stuart SidNumber 0251115 EpisodeId 682082
EmpName Stuart SidNumber 0251115 EpisodeId 682082

or atleast space delimited values

Stuart 0251115 682082
Stuart 0251115 682082

any help would be appreciated.

Thanks in advance, Vivek

do not go for sed or awk, they are not the rigth tool for the job. go for some xml aware tool like xmllint here is one way you should not do things , but will for fine for small xmls. declare $(awk -v FS='[<>]' 'length($3){print $2"="$3}' inputfile) then echo $EmpName — P....
– P...., Commented Apr 4, 2017 at 4:55

VIPIN KUMAR · Accepted Answer · 2017-04-04 08:15:22Z

1

Try this - (Created two sample files f1.txt f2.txt)

$ head f?.txt
==> f1.txt <==
 <Employee>
      <Id>432361</Id>
      <EmpName>Stuart</EmpName>
      <SidNumber>0251115</SidNumber>
      <CreatedUtc>2016-11-14T22:27:53.477+08:00</CreatedUtc>
      <EpisodeId>682082</EpisodeId>
      <CorrelationId>323A6C86-76AA-E611-80DA-005056B46023</CorrelationId>
   </Employee>

==> f2.txt <==
 <Employee>
      <Id>432361</Id>
      <EmpName>vipin</EmpName>
      <SidNumber>0251117</SidNumber>
      <CreatedUtc>2016-12-14T22:27:53.477+08:00</CreatedUtc>
      <EpisodeId>682082</EpisodeId>
      <CorrelationId>323A6C86-76AA-E611-80DA-005056B46023</CorrelationId>
   </Employee>

Processing...

$ for i in f?.txt;do awk -F'[<>]' '/EmpName|SidNumber|EpisodeId/{printf $3OFS} END {print ""}' $i;done
 Stuart 0251115 682082 
 vipin 0251117 682082

for proper formatted output -

$ for i in f?.txt;do awk -F'[<>]' '/EmpName|SidNumber|EpisodeId/{printf $3OFS} END {print ""}' $i;done|column -t
Stuart  0251115  682082
vipin   0251117  682082

if you don't have column cmd available you can try below cmd -

for i in f?.txt;do awk -F'[<>]' '/EmpName|SidNumber|EpisodeId/{printf "%-10s", $3OFS} END {print ""}' $i;done
Stuart    0251115   682082    
vipin     0251117   682082

In printf function of awk we can format the column values.

edited Apr 4, 2017 at 8:15

answered Apr 4, 2017 at 5:09

VIPIN KUMAR

3,1572 gold badges25 silver badges37 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Vivek Vishal Over a year ago

Thanks a lot Vipin, you solved my issue. If possible just one more question, in my shell for column command I am getting "bash: column: command not found". Is there any alternative to format the output.

VIPIN KUMAR Over a year ago

@VivekVishal - I have updated my answer as per your need, check please.''

Vivek Vishal Over a year ago

Thanks Vipin, really appreciate your help

pyed · Accepted Answer · 2017-04-04 04:51:37Z

0

nawk -F'[<>]' '/<EmpName>|<SidNumber>|<EpisodeId>/{print $3}' *.dat

answered Apr 4, 2017 at 4:51

pyed

3491 gold badge3 silver badges10 bronze badges

2 Comments

Vivek Vishal Over a year ago

Thanks pyed, just wondering if there is a way to get a formatted output like EmpName Stuart Stuart 0251115 EpisodeId 682082

J. Chomel Over a year ago

"Your answer certainly is worth a little explanation. Kindly refer to stackoverflow.com/help/how-to-answer . Comments would help create searchable content. "

Collectives™ on Stack Overflow

How to extract multiple tag values from multiple xml files in linux

2 Answers 2

3 Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related