I am currently parsing a Wikipedia dump, trying to extract some useful information. The parsing takes place in XML, and I want to extract only the text / content for each page. Now I'm wondering how you can find all text inside a tag that is inside another tag. I searched for similar questions, but only found the ones having problems with a singular tag. Here is an example of what I want to achieve:
<revision>
<timestamp>2001-01-15T13:15:00Z</timestamp>
<contributor>
<username>Foobar</username>
<id>65536</id>
</contributor>
<comment>I have just one thing to say!</comment>
<text>A bunch of [[text]] here.</text>
<minor />
</revision>
<example_tag>
<timestamp>2001-01-15T13:15:00Z</timestamp>
<contributor>
<username>Foobar</username>
<id>65536</id>
</contributor>
<comment>I have just one thing to say!</comment>
<text>A bunch of [[text]] here.</text>
<minor />
</example_tag>
How can I extract the text inside the text tag, but only when it is included in the revision tree?