Shell script using sed to remove HTML tag containing specific text

Question

I am looking for a way to delete (with sed if possible) an html tag containing a specific word. For instance, delete every div tag containing the word foo. The divs can of course contain multiple lines. For instance :

<body>
    <div>
        This div will be <i>deleted</i>.
        Why ?
        Because it contains foo.
    </div>

    <div>
        This div doesn't contains the forbidden word.
        <b>So it won't be deleted.</b>
    </div>
</body>

I found ways to delete html tags, but nothing about tags containing a specific text. Thanks !

Unfortunately no, I am not a sed expert. At all. But I have spent some time looking around for a solution and didn't find anything that could help me so far. — Julien Rouvier
– Julien Rouvier, Commented Dec 2, 2013 at 15:53
Use a language that has a proper HTML parser available. sed will only handle, for example, nested divs with great difficulty. — chepner
– chepner, Commented Dec 2, 2013 at 16:10

iptable · Accepted Answer · 2013-12-02 17:53:42Z

2

It is not possible with sed alone. Sed is a single-line processor. If you want a script using sed/bash/grep, you would need to create a parser that will parse div contents and only print the divs that don't contain the text you wanted. Seriously, look for a html parser instead.

answered Dec 2, 2013 at 17:53

iptable

211 bronze badge

Sign up to request clarification or add additional context in comments.

1 Comment

Fredrik Pihl Over a year ago

This is not impossible with sed, just a bit cumbersome and there are better ways to do it, see, e.g. stackoverflow.com/questions/5341158/…. But impossible it is not!

Collectives™ on Stack Overflow

Shell script using sed to remove HTML tag containing specific text

1 Answer 1

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related