I have an XML file of form:
...
<element1>
<element2>
<group1>
<tag1>value</tag1>
<tag2>value</tag2>
</group1>
<group1>
<tag1>value</tag1>
<tag2>value</tag2>
</group1>
<element2>
...
I used
sed -n '/\<group1\>/,\<\/group1>/p' filename
to extract all content of group1 tags, including all childs. This is exactly what I want.
<group1>
<tag1>value</tag1>
<tag2>value</tag2>
</group1>
<group1>
<tag1>value</tag1>
<tag2>value</tag2>
</group1>
However, if the input XML is of form
...
<element1>
<element2>
<group2>
<group2>value</group2>
<otherTag>value</otherTag>
</group2>
<element3>
<group2>
<group2>value</group2>
<otherTag>value</otherTag>
</group2>
...
And I tried to extract following content
<group2>
<group2>value</group2>
<otherTag>value</otherTag>
</group2>
<group2>
<group2>value</group2>
<otherTag>value</otherTag>
</group2>
The sed command above just returns:
<group2>
<group2>value</group2>
It understands the stop pattern </group2> and do no more extraction. I'm quite confused here. Why doesn't it continue extracting the next <group2>, as in <group1> case. Is there any way to make it work with sed? and any other alternatives?