0

I am looking for a way to delete (with sed if possible) an html tag containing a specific word. For instance, delete every div tag containing the word foo. The divs can of course contain multiple lines. For instance :

<body>
    <div>
        This div will be <i>deleted</i>.
        Why ?
        Because it contains foo.
    </div>

    <div>
        This div doesn't contains the forbidden word.
        <b>So it won't be deleted.</b>
    </div>
</body>

I found ways to delete html tags, but nothing about tags containing a specific text. Thanks !

3
  • 3
    Have you tried anything? Commented Dec 2, 2013 at 15:50
  • Unfortunately no, I am not a sed expert. At all. But I have spent some time looking around for a solution and didn't find anything that could help me so far. Commented Dec 2, 2013 at 15:53
  • 1
    Use a language that has a proper HTML parser available. sed will only handle, for example, nested divs with great difficulty. Commented Dec 2, 2013 at 16:10

1 Answer 1

2

It is not possible with sed alone. Sed is a single-line processor. If you want a script using sed/bash/grep, you would need to create a parser that will parse div contents and only print the divs that don't contain the text you wanted. Seriously, look for a html parser instead.

Sign up to request clarification or add additional context in comments.

1 Comment

This is not impossible with sed, just a bit cumbersome and there are better ways to do it, see, e.g. stackoverflow.com/questions/5341158/…. But impossible it is not!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.