I've seen the following stackoverflow How to use regex for multiple line pattern in shell script but it doesn't do exactly what I want. I'm looking for a terminal based way of doing an in-place sed (or perl) regex that will auto change some files for me. (I can probably do it with xml libraries/etc., but I would prefer to use the terminal).
The file I have
Some text
<div class="firstClass secondClass" something="else">
Some random stuff
</div>
Random Text
<div class="thirdClass fifthClass" something="else">
Some random stuff
< is something
< but not /> This
</div>
<div class="fourthClass">
Some random stuff
</div>
Final Text
I tried to do an arbitrary enough example to show a few different use-cases. I'm trying to convert it into something like the following:
Some text
<!-- firstClass start -->
Some random stuff
<!-- firstClass end -->
Random Text
<!-- thirdClass start -->
Some random stuff
< is something
< but not /> This
<!-- thirdClass end -->
<!-- fourthClass start -->
Some random stuff
<!-- fourthClass end -->
Final Text
I am trying the following code:
sed -n '/<div class="\([^ "]*\)[^>]*>/,/<\/div>/{s/<div class="\([^ "]*\)[^>]*>/<!-- \1 start -->/;/<\/div>/d;p}' file
but since in the previous stackoverflow question the person didn't want the final line, the answers deleted it, which is not what I want. As can be seen, I want that first text repeated before and after the inside contents.
The regex above properly fixes the first line (changes the div to a comment), but I can't seem to replicate that below the text. I tried to mess around with the regex expression, but I can't seem to get it to work. It's additionally cutting out the very first line and the last lines although I'd like to keep them. Any ideas how to do something like this?
(PS, yes, I know we need sed -i for an in-place command, but I want to test it out before I actually run through with it for obvious reasons)
Edit: A little addendum as to the idea of what I'm trying to do. Although the above is HTML, this code is not necessarily exclusively for HTML (hence why I don't want HTML/XML processing). The idea is:
Some random text before my pattern
PATTERN "info ...
random stuffs
END PATTERN
Some random stuff after pattern
I'd like this to be converted to
Some random text before my pattern
NEW PATTERN - info
random stuffs
END NEW PATTERN - info
Some random stuff after pattern
So no html necessarily. Just something that takes a pattern above some text, replicates it below. The only condition is that random stuffs will not have the text END PATTERN and so that's what I want to base it off of. random stuffs will 100% never ever have the END PATTERN text. There's no nesting involved nor any edge cases. It's always the same pattern as shown above. The only "edge" case is that the first line PATTERN "info ... might have some extra text up until a line break which I don't care about. That can always be deleted. I only care about the word info (aka up until the first space character or first " character.)
divelements? Any nesting? ...)<div>instead of##someor some other string. Which is why I want to stay away from HTML processing. I want to use this with things other than html as well. I'll try and add an addendum with a little more about the idea.<div>instead of##someor some other string." -- huh? But you want to capture between<div>and</div>tags? Call it whatever but that's structured text. How does it go with "##some" ? Whats' the closing element? It'd better be some known format where you can use libraries or you'll have to write a parser and it won't be a one-liner I'm afraid. (Unless you actually have a trivial case ofstart--text--stop)