1

I have the following type of XML file,

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE eSummaryResult PUBLIC "-//NLM//DTD eSummaryResult, 29 October 2004//EN" "http://www.ncbi.nlm.nih.gov/entrez/query/DTD/eSummary_041029.dtd">
<eSummaryResult>
<DocSum>
    <Id>224589801</Id>
    <Item Name="Caption" Type="String">NC_000010</Item>
    <Item Name="Title" Type="String">Homo sapiens chromosome 10, GRCh37.p10 Primary Assembly</Item>
    <Item Name="Extra" Type="String">gi|224589801|gnl|ASM:GCF_000001305|10|ref|NC_000010.10||gpp|GPC_000000034.1||gnl|NCBI_GENOMES|10[224589801]</Item>
    <Item Name="Gi" Type="Integer">224589801</Item>
    <Item Name="CreateDate" Type="String">2002/08/29</Item>
    <Item Name="UpdateDate" Type="String">2012/10/30</Item>
    <Item Name="Flags" Type="Integer">544</Item>
    <Item Name="TaxId" Type="Integer">9606</Item>
    <Item Name="Length" Type="Integer">135534747</Item>
    <Item Name="Status" Type="String">live</Item>
    <Item Name="ReplacedBy" Type="String"/>
    <Item Name="Comment" Type="String"><![CDATA[  ]]></Item>
</DocSum>

</eSummaryResult>

How to extract the details from node="Item" based on the name value it has? And also is it good to use the standard java dom xml or better to use any other xml parser library for this purpose?

0

4 Answers 4

1

I suggest StAX, try this (javax.xml.stream.*)

    XMLInputFactory f = XMLInputFactory.newInstance();
    XMLStreamReader rdr = f.createXMLStreamReader(new FileReader("test.xml"));
    while (rdr.hasNext()) {
        if (rdr.next() == XMLStreamConstants.START_ELEMENT) {
            if (rdr.getLocalName().equals("Item")) {
                System.out.println(rdr.getAttributeValue("", "Name"));
                System.out.println(rdr.getElementText());
            }
        }
    }

StAX must be always the first thing to consider. See http://en.wikipedia.org/wiki/StAX you will know why

Sign up to request clarification or add additional context in comments.

Comments

1

Try the below code

/* Create a Document object (doc) from the xml */
NodeList list = doc.getElementsByTagName("Item");

for(int i=0;i<list.getLength();i++)
{
    Node node = list.item(i);
    NamedNodeMap namedNodeMap = node.getAttributes();
    if(namedNodeMap.getNamedItem("Name").getTextContent().equalsIgnoreCase("Caption"))
    {
         System.out.println(node.getTextContent());
    }
}

The output should be NC_000010

Comments

1

If only using standard Java, XPath is the way to go:

private URL xml = getClass().getResource("/example.xml");

@Test
public void testExamples() throws Exception {
    //assertEquals("NC_000010", extractUsingDom("Caption"));
    assertEquals("NC_000010", extractUsingXPath("Caption"));
}

public String extractUsingXPath(final String name) throws XPathExpressionException, IOException {
    // XPathFactory class is not thread-safe so we do not store it
    XPath xpath = XPathFactory.newInstance().newXPath();
    return xpath.evaluate(
        String.format("/eSummaryResult/DocSum/Item[@Name='%s']", name), // xpath expression
        new InputSource(xml.openStream()));                             // the XML Document
}

Comments

0

Maybe use XPath?

Document dom = ...;
XPath xpath = XPathFactory.newInstance().newXPath();
String result = xpath.evaluate("/eSummaryResult/DocSum/Item[@Name='Title']", dom);

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.