This is my problem: i need to extract the text between the tag "p" without the XML notation using SAX Parser
<title>1. Introduction</title>
<p>The Lorem ipsum
<xref ref-type="bibr" rid="B1">
1
</xref>.
Lorem ipsum 23.
</p>
<p>The L domain recruits an ATP-requiring cellular factor for this
scission event, the only known energy-dependent step in assembly
<xref ref-type="bibr" rid="B2">
2
</xref>.
Domain is used here to denote the amino
acid sequence that constitutes the biological function.
</p>
Is it possible using endElement() ? Because when i use it i obtain only the part after "/xref" tag
Here the code
public void endElement(String s, String s1, String element) throws SAXException {
if(element.equals(Finals.PARAGRAPH)){
Paragraph paragraph = new Paragraph();
paragraph.setContext(tmpValue);
System.out.println("Contesto: " + tmpValue);
listP.add(paragraph);
}
}
@Override
public void characters(char[] ac, int i, int j) throws SAXException {
tmpValue = new String(ac, i, j);
}
This is what i expect to do: a list listP containing the two paragraphs:
1) Lorem ipsum 1 Lorem ipsum 23.
2) The L domain recruits an ATP-requiring cellular factor for this
scission event, the only known energy-dependent step in assembly 2
Domain is used here to denote the amino
acid sequence that constitutes the biological function.
endElementis invoked on ... ending elements. You are interested in a section called CDATA. You should find the appropriate handler for this. And you should present your current attempt using your actual code.The L domain recruits an ATP-requiring cellular factor for this scission event, the only known energy-dependent step in assembly 2. Domain is used here to denote the amino acid sequence that constitutes the biological function.but i get onlyDomain is used here to denote the amino acid sequence that constitutes the biological function.