0

I want to replace specific xml node value using sed or awk. I can't use specialized packages for parsing xml like xmlstarlet, xmllint etc. I have to use sed or awk, just "basic" shell.

I have many big xml files. In that file I want to target and replace two tags values: example:

<desc:partNumber>>2</desc:partNumber>
<desc:dateIssued>>1870</desc:dateIssued>

Problem is, there are hundreds tags with these names. But these two tags have parent tag that is unique within whole xml file:

<desc:desc ID="DESC_VOLUME_0001">

Another problem is that location or line numbers of tags <desc:partNumber> and <desc:dateIssued> which are inside parent <desc:desc ID="DESC_VOLUME_0001"> are different in every file.

I think the solution would be:

  1. Target and extract parent <desc:desc ID="DESC_VOLUME_0001"> and its children to variable
  2. Iterate children and get location(line number) of <desc:partNumber> and <desc:dateIssued> and save to variable
  3. Pass the line number to sed command and replace current value of that tag with new value(new value will be read from .csv file)

I tried create this sed command, you can see I used 'n' to move over lines, but this needs to be variable.

sed -i '/desc:desc ID="DESC_VOLUME_0001"/{n;n;n;n;n;n;n;n;n;s/'"${OLD_DATE_ISSUED}"'/'"${NEW_DATE_ISSUED}"'/}'

Parent node with children:

<desc:desc ID="DESC_VOLUME_0001"> 
    <desc:physicalDescription> 
        <desc:note>text</desc:note> 
    </desc:physicalDescription>  
    <desc:titleInfo> 
        <desc:partNumber>2</desc:partNumber> 
    </desc:titleInfo>  
    <desc:originInfo> 
        <desc:dateIssued>1870</desc:dateIssued> 
    </desc:originInfo>  
    <desc:identifier type="uuid">81e32d30-6388-11e6-8336-005056827e52</desc:identifier> 
</desc:desc> 

Can anybody help how to achieve this?

5
  • Please add sample input (no descriptions, no images, no links) and your desired output for that sample input to your question (no comment). Commented Oct 16, 2020 at 12:08
  • 4
    can't use specialized packages for parsing xml like xmlstarlet, xmllint etc. You should tell whoever's making that decision that they're kneecapping you. Using an XML aware tool is the only way to do this robustly and effectively. Commented Oct 16, 2020 at 13:03
  • Why not use separate sed substitution for opening and closing tag? And do take care of the angle brackets lest you will accidentally replace desc:partNumberClient also. Commented Oct 16, 2020 at 13:45
  • 3
    "I have to use sed or awk, just "basic" shell." No you don't. It's the wrong tool for the job. Your code will be incorrect and inefficient. Commented Oct 16, 2020 at 16:48
  • Related: bobince's cautionary answer to RegEx match open tags except XHTML self-contained tags Commented Oct 16, 2020 at 18:28

1 Answer 1

2

With the example data in the file xmldata:

awk -v dID="DESC_VOLUME_0001" -v part="5" -v dissue=1850 -F[\<\>] 
  '$2 ~ /desc ID/ { 
                     split($2,arr,"\"");
                     descID=arr[2] 
                  } 
   $2 ~ /desc:partNumber/ { 
                            if (descID==dID) { 
                                               $0=gensub($3,part,$0) 
                                             } 
                          } 
   $2 ~ /desc:dateIssued/ { 
                            if (descID==dID) 
                                             { 
                                               $0=gensub($3,dissue,$0) 
                                             } 
                          }
   1' xmldata

One liner:

 awk -v dID="DESC_VOLUME_0001" -v part="5" -v dissue=1850 -F[\<\>] '$2 ~ /desc ID/ { split($2,arr,"\"");descID=arr[2] } $2 ~ /desc:partNumber/ { if (descID==dID) { $0=gensub($3,part,$0) } } $2 ~ /desc:dateIssued/ { if (descID==dID) { $0=gensub($3,dissue,$0) } }1' xmldata

Here we set the delimiters to < or > We also set dID to the desc ID we want to search for, part the partNumber we want to change to and dissue to the dateIssued we want to change.

We then search for the desc ID text in the line and split the line based on double quotes to get the second index of the array arr which is then used to create the variable descID.

We further search for partNumber and dateIssued, checking to see if dID=descID. If they match we replace the 3rd delimited field in the line $0 with the passed variables using the gensub function and set $0 to the result. We finally print the line (changed or otherwise) through 1.

Sign up to request clarification or add additional context in comments.

1 Comment

Works great. Thanks

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.