I have a very basic html file called example.html (see below)
<html>
<body>
<div class="one">
<div class="research">
<div class="two">
<p>Lorem ipsum...</p>
</div>
<div class="three">
<p>Lorem ipsum...</p>
</div>
<div class="four">
<p>Lorem ipsum...</p>
</div>
</div>
</div>
</body>
</html>
and I'd like to get only phrase like (see below), but not by removing first and last 3 lines.
<div class="research">
<p>Lorem ipsum...</p>
<div class="two"></div>
<div class="three"></div>
<div class="four"></div>
</div>
I have tried with awk:
cat example.html | awk '/^<div\ class="research">$/,/^<\/div>$/ { print }'
but something seems to be wrong.
I also tried with body tag (see below)
cat example.html | awk '/^<body>$/,/^<\/body>$/ { print }'
(result)
<body>
<div class="one">
<div class="research">
<div class="two">
<p>Lorem ipsum...</p>
</div>
<div class="three">
<p>Lorem ipsum...</p>
</div>
<div class="four">
<p>Lorem ipsum...</p>
</div>
</div>
</div>
</body>
And it's working correctly.
What I've doing wrong?
Thanks in advance.
/^<div class="research">$/doesn't work because<divisn't at the beginning of the line, and^matches the beginning of the line.</div>are in the game. So the question is how to select text to proper endingdivtag?first,lastpattern, you have to writeawkcode to increment a counter when you see another<div>, and decrement it when you see a</div>. When the counter goes to 0, you've matched the first one.cat.