0

We need to extract multiple tag values from multiple files.

We have around 1000 files with data similar to:

<Employee>
  <Id>432361</Id>
  <EmpName>Stuart</EmpName>
  <SidNumber>0251115</SidNumber>
  <CreatedUtc>2016-11-14T22:27:53.477+08:00</CreatedUtc>
  <EpisodeId>682082</EpisodeId>
  <CorrelationId>323A6C86-76AA-E611-80DA-005056B46023</CorrelationId>
</Employee>

we need to extract EmpName, SidNumber and EpisodeId from all the files to a single file. we are able to get one value at a time, for ex. using command:

nawk -F'[<>]' '/<EpisodeId>/{print $3}' *.dat

But we need to get multiple tags of each file. the output format should be something similar to

EmpName Stuart SidNumber 0251115 EpisodeId 682082
EmpName Stuart SidNumber 0251115 EpisodeId 682082 

or atleast space delimited values

Stuart 0251115 682082
Stuart 0251115 682082

any help would be appreciated.

Thanks in advance, Vivek

1
  • 3
    do not go for sed or awk, they are not the rigth tool for the job. go for some xml aware tool like xmllint here is one way you should not do things , but will for fine for small xmls. declare $(awk -v FS='[<>]' 'length($3){print $2"="$3}' inputfile) then echo $EmpName Commented Apr 4, 2017 at 4:55

2 Answers 2

1

Try this - (Created two sample files f1.txt f2.txt)

$ head f?.txt
==> f1.txt <==
 <Employee>
      <Id>432361</Id>
      <EmpName>Stuart</EmpName>
      <SidNumber>0251115</SidNumber>
      <CreatedUtc>2016-11-14T22:27:53.477+08:00</CreatedUtc>
      <EpisodeId>682082</EpisodeId>
      <CorrelationId>323A6C86-76AA-E611-80DA-005056B46023</CorrelationId>
   </Employee>

==> f2.txt <==
 <Employee>
      <Id>432361</Id>
      <EmpName>vipin</EmpName>
      <SidNumber>0251117</SidNumber>
      <CreatedUtc>2016-12-14T22:27:53.477+08:00</CreatedUtc>
      <EpisodeId>682082</EpisodeId>
      <CorrelationId>323A6C86-76AA-E611-80DA-005056B46023</CorrelationId>
   </Employee>

Processing...

$ for i in f?.txt;do awk -F'[<>]' '/EmpName|SidNumber|EpisodeId/{printf $3OFS} END {print ""}' $i;done
 Stuart 0251115 682082 
 vipin 0251117 682082 

for proper formatted output -

$ for i in f?.txt;do awk -F'[<>]' '/EmpName|SidNumber|EpisodeId/{printf $3OFS} END {print ""}' $i;done|column -t
Stuart  0251115  682082
vipin   0251117  682082

if you don't have column cmd available you can try below cmd -

for i in f?.txt;do awk -F'[<>]' '/EmpName|SidNumber|EpisodeId/{printf "%-10s", $3OFS} END {print ""}' $i;done
Stuart    0251115   682082    
vipin     0251117   682082 

In printf function of awk we can format the column values.

Sign up to request clarification or add additional context in comments.

3 Comments

Thanks a lot Vipin, you solved my issue. If possible just one more question, in my shell for column command I am getting "bash: column: command not found". Is there any alternative to format the output.
@VivekVishal - I have updated my answer as per your need, check please.''
Thanks Vipin, really appreciate your help
0
nawk -F'[<>]' '/<EmpName>|<SidNumber>|<EpisodeId>/{print $3}' *.dat

2 Comments

Thanks pyed, just wondering if there is a way to get a formatted output like EmpName Stuart Stuart 0251115 EpisodeId 682082
"Your answer certainly is worth a little explanation. Kindly refer to stackoverflow.com/help/how-to-answer . Comments would help create searchable content. "

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.