Need to delete some nodes from the XML file if a subnode contains the particular string or word, Sample XML file.
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="/3982474/sitemap_nb.xsl"?>
<urlset
xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:xhtml="http://www.w3.org/1999/xhtml"
xsi:schemaLocation="
http://www.sitemaps.org/schemas/sitemap/0.9
http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd">
<url>
<loc>https://www.test.com/home</loc>
<lastmod>2020-08-03T14:41:44+00:00</lastmod>
<changefreq>daily</changefreq>
<priority>0.8000</priority>
</url>
<url>
<loc>https://www.test.com/features?xxxxx=serviceability-point-access</loc>
<lastmod>2020-08-03T14:41:44+00:00</lastmod>
<changefreq>daily</changefreq>
<priority>0.5120</priority>
</url>
<url>
<loc>https://www.test.com/eu/index</loc>
<lastmod>2020-08-03T14:41:44+00:00</lastmod>
<changefreq>daily</changefreq>
<priority>0.8000</priority>
</url>
<url>
<loc>https://www.test.com/features?xxxxx=serviceability-point-access</loc>
<lastmod>2020-08-03T14:41:44+00:00</lastmod>
<changefreq>daily</changefreq>
<priority>0.5120</priority>
</url>
<url>
<loc>https://www.test.com/models/s510/features?xxxxx=serviceability</loc>
<lastmod>2020-08-03T14:41:44+00:00</lastmod>
<changefreq>daily</changefreq>
<priority>0.5120</priority>
</url>
<url>
<loc>https://www.test.com/index</loc>
<lastmod>2020-08-03T14:41:44+00:00</lastmod>
<changefreq>daily</changefreq>
<priority>0.8000</priority>
</url>
</urlset>
find the string "xxxxx" and delete the set of the node
Results should be
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="/3982474/sitemap_nb.xsl"?>
<urlset
xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:xhtml="http://www.w3.org/1999/xhtml"
xsi:schemaLocation="
http://www.sitemaps.org/schemas/sitemap/0.9
http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd">
<url>
<loc>https://www.test.com/home</loc>
<lastmod>2020-08-03T14:41:44+00:00</lastmod>
<changefreq>daily</changefreq>
<priority>0.8000</priority>
</url>
<url>
<loc>https://www.test.com/eu/index</loc>
<lastmod>2020-08-03T14:41:44+00:00</lastmod>
<changefreq>daily</changefreq>
<priority>0.8000</priority>
</url>
<url>
<loc>https://www.test.com/index</loc>
<lastmod>2020-08-03T14:41:44+00:00</lastmod>
<changefreq>daily</changefreq>
<priority>0.8000</priority>
</url>
</urlset>
I used the following sed comment but it's omitting the urlset's element attributes. I am not expert on shell script, please check and suggest what I am missing.
sed -ne '/?xml/{ p; b }; /urlset/{ p; b }; /<url/{ h; b }; H; /<\/url>/{ x; /?xxxxx/b; /?xxxxx/b; p }'
The above sed produce below xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="/3982474/sitemap_nb.xsl"?>
<urlset
<url>
<loc>https://www.test.com/home</loc>
<lastmod>2020-08-03T14:41:44+00:00</lastmod>
<changefreq>daily</changefreq>
<priority>0.8000</priority>
</url>
<url>
<loc>https://www.test.com/eu/index</loc>
<lastmod>2020-08-03T14:41:44+00:00</lastmod>
<changefreq>daily</changefreq>
<priority>0.8000</priority>
</url>
<url>
<loc>https://www.test.com/index</loc>
<lastmod>2020-08-03T14:41:44+00:00</lastmod>
<changefreq>daily</changefreq>
<priority>0.8000</priority>
</url>
</urlset>
The urlset node and attributes is missing can someone help me?... Thank you
sedas you don't drive-in a screw with a hammer. Use the right tools for the job. Use an XML parser and processor like xmlstarlet, xsltproc, saxon...xml.xmllint , xmlcatalog, xmlwffrom my sever not 'xmlstarlt' and 'xml' . Could you please check and suggest.