0

Need to delete some nodes from the XML file if a subnode contains the particular string or word, Sample XML file.

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="/3982474/sitemap_nb.xsl"?>
<urlset
      xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
      xmlns:xhtml="http://www.w3.org/1999/xhtml"
      xsi:schemaLocation="
            http://www.sitemaps.org/schemas/sitemap/0.9
            http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd">


  <url>
       <loc>https://www.test.com/home</loc>
       <lastmod>2020-08-03T14:41:44+00:00</lastmod>
       <changefreq>daily</changefreq>
       <priority>0.8000</priority>
  </url>
  <url>
       <loc>https://www.test.com/features?xxxxx=serviceability-point-access</loc>
       <lastmod>2020-08-03T14:41:44+00:00</lastmod>
       <changefreq>daily</changefreq>
       <priority>0.5120</priority>
  </url>
  <url>
       <loc>https://www.test.com/eu/index</loc>
       <lastmod>2020-08-03T14:41:44+00:00</lastmod>
       <changefreq>daily</changefreq>
       <priority>0.8000</priority>
  </url>
<url>
       <loc>https://www.test.com/features?xxxxx=serviceability-point-access</loc>
       <lastmod>2020-08-03T14:41:44+00:00</lastmod>
       <changefreq>daily</changefreq>
       <priority>0.5120</priority>
  </url>
  <url>
       <loc>https://www.test.com/models/s510/features?xxxxx=serviceability</loc>
       <lastmod>2020-08-03T14:41:44+00:00</lastmod>
       <changefreq>daily</changefreq>
       <priority>0.5120</priority>
  </url>
  <url>
       <loc>https://www.test.com/index</loc>
       <lastmod>2020-08-03T14:41:44+00:00</lastmod>
       <changefreq>daily</changefreq>
       <priority>0.8000</priority>
  </url>
</urlset>


find the string "xxxxx" and delete the set of the node

Results should be

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="/3982474/sitemap_nb.xsl"?>
<urlset
      xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
      xmlns:xhtml="http://www.w3.org/1999/xhtml"
      xsi:schemaLocation="
            http://www.sitemaps.org/schemas/sitemap/0.9
            http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd">


  <url>
       <loc>https://www.test.com/home</loc>
       <lastmod>2020-08-03T14:41:44+00:00</lastmod>
       <changefreq>daily</changefreq>
       <priority>0.8000</priority>
  </url>
  
  <url>
       <loc>https://www.test.com/eu/index</loc>
       <lastmod>2020-08-03T14:41:44+00:00</lastmod>
       <changefreq>daily</changefreq>
       <priority>0.8000</priority>
  </url>
  <url>
       <loc>https://www.test.com/index</loc>
       <lastmod>2020-08-03T14:41:44+00:00</lastmod>
       <changefreq>daily</changefreq>
       <priority>0.8000</priority>
  </url>
</urlset>

I used the following sed comment but it's omitting the urlset's element attributes. I am not expert on shell script, please check and suggest what I am missing.

sed -ne '/?xml/{ p; b }; /urlset/{ p; b }; /<url/{ h; b }; H; /<\/url>/{ x; /?xxxxx/b; /?xxxxx/b; p }'

The above sed produce below xml

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="/3982474/sitemap_nb.xsl"?>
<urlset
     
  <url>
       <loc>https://www.test.com/home</loc>
       <lastmod>2020-08-03T14:41:44+00:00</lastmod>
       <changefreq>daily</changefreq>
       <priority>0.8000</priority>
  </url>
<url>
       <loc>https://www.test.com/eu/index</loc>
       <lastmod>2020-08-03T14:41:44+00:00</lastmod>
       <changefreq>daily</changefreq>
       <priority>0.8000</priority>
  </url>
  <url>
       <loc>https://www.test.com/index</loc>
       <lastmod>2020-08-03T14:41:44+00:00</lastmod>
       <changefreq>daily</changefreq>
       <priority>0.8000</priority>
  </url>
</urlset>
  

The urlset node and attributes is missing can someone help me?... Thank you

7
  • 2
    You don't parse XML markup with sed as you don't drive-in a screw with a hammer. Use the right tools for the job. Use an XML parser and processor like xmlstarlet, xsltproc, saxon... Commented Sep 3, 2020 at 21:42
  • We are using CentOS 6.10, it's not allowing to install snapd to install xmlstarlet.So decided to use sed Commented Sep 4, 2020 at 6:36
  • @Rajaguru check if it is already installed, under the name xml. Commented Sep 4, 2020 at 20:25
  • @thanasisp I can only xmllint , xmlcatalog, xmlwf from my sever not 'xmlstarlt' and 'xml' . Could you please check and suggest. Commented Sep 5, 2020 at 6:51
  • They do not edit xml. Commented Sep 5, 2020 at 12:11

2 Answers 2

2

Use xmlstarlet (for some shells by default named xml also)

xmlstarlet ed -d '//urlset/url[loc[contains(text(), "xxxxx")]]' file.xml

This will delete all <url> nodes having a subnode <loc> containing the text xxxxx.

Sign up to request clarification or add additional context in comments.

2 Comments

sed -ne '/?xml/{ p; b }; /?xml-stylesheet/{ p; b }; /urlset/{ p; b }; /<url/{ h; b }; H; /<\/url>/{ x; /?xxxxx/b; /?xxxxx/b; p }' -- This sed comment remove nodes but at the same time it's remove <ruleset> nodes attributes too..anyone help is appericiated.
xmlstarlet install require snapd but CentOS 6.10 not allowing to install snapd . so we decided to use sed.
-1

The XML's root element has the following namespace attributes, this causes the issue.

<urlset
      xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
      xmlns:xhtml="http://www.w3.org/1999/xhtml"
      xsi:schemaLocation="
            http://www.sitemaps.org/schemas/sitemap/0.9
            http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd">

The xmlstarlet-1.5.0+suggested to use prefix '' brefore node (For instance, use /:node instead of /node). http://xmlstar.sourceforge.net/doc/UG/ch05.html

xmlstarlet ed -d "//_:urlset/_:url[*[contains(text(),'xxxxx')]]" file.xml

This solution works perfectly for the above problem. Thanks

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.