5

I understand from Googling that it makes more sense to extract data from XML using XPath than by using DOM looping.

At the moment, I have implemented a solution using DOM, but the code is verbose, and it feels untidy and unmaintainable, so I would like to switch to a cleaner XPath solution.

Let's say I have this structure:

<products>
    <product>
        <title>Some title 1</title>
        <image>Some image 1</image>
    </product>
    <product>
        <title>Some title 2</title>
        <image>Some image 2</image>
    </product>
    ...
</products>

I want to be able to run a for loop for each of the <product> elements, and inside this for loop, extract the title and image node values.

My code looks like this:

InputStream is = conn.getInputStream();          
DocumentBuilder builder =
  DocumentBuilderFactory.newInstance().newDocumentBuilder();
Document doc = builder.parse(is);
XPathFactory factory = XPathFactory.newInstance();
XPath xpath = factory.newXPath();
XPathExpression expr = xpath.compile("/products/product");
Object result = expr.evaluate(doc, XPathConstants.NODESET);
NodeList products = (NodeList) result;
for (int i = 0; i < products.getLength(); i++) {
    Node n = products.item(i);
    if (n != null && n.getNodeType() == Node.ELEMENT_NODE) {
        Element product = (Element) n;
        // do some DOM navigation to get the title and image
    }
}

Inside my for loop I get each <product> as a Node, which is cast to an Element.

Can I simply use my instance of XPathExpression to compile and run another XPath on the Node or the Element?

2 Answers 2

6

Yes, you can always do like this -

XPathFactory factory = XPathFactory.newInstance();
XPath xpath = factory.newXPath();
XPathExpression expr = xpath.compile("/products/product");
Object result = expr.evaluate(doc, XPathConstants.NODESET);
expr = xpath.compile("title"); // The new xpath expression to find 'title' within 'product'.

NodeList products = (NodeList) result;
for (int i = 0; i < products.getLength(); i++) {
    Node n = products.item(i);
    if (n != null && n.getNodeType() == Node.ELEMENT_NODE) {
        Element product = (Element) n;
        NodeList nodes = (NodeList)  expr.evaluate(product,XPathConstants.NODESET); //Find the 'title' in the 'product'
        System.out.println("TITLE: " + nodes.item(0).getTextContent()); // And here is the title 
    }
}    

Here I have given example of extracting the 'title' value. In same way you can do for 'image'

Sign up to request clarification or add additional context in comments.

Comments

4

I'm not a big fan of this approach because you have to build a document (which might be expensive) before you can apply XPaths to it.

I've found VTD-XML a lot more efficient when it comes to applying XPaths to documents, because you don't need to load the whole document into memory. Here is some sample code:

final VTDGen vg = new VTDGen();
vg.parseFile("file.xml", false);
final VTDNav vn = vg.getNav();
final AutoPilot ap = new AutoPilot(vn);

ap.selectXPath("/products/product");
while (ap.evalXPath() != -1) {
    System.out.println("PRODUCT:");

    // you could either apply another xpath or simply get the first child
    if (vn.toElement(VTDNav.FIRST_CHILD, "title")) {
        int val = vn.getText();
        if (val != -1) {
            System.out.println("Title: " + vn.toNormalizedString(val));
        }
        vn.toElement(VTDNav.PARENT);
    }
    if (vn.toElement(VTDNav.FIRST_CHILD, "image")) {
        int val = vn.getText();
        if (val != -1) {
            System.out.println("Image: " + vn.toNormalizedString(val));
        }
        vn.toElement(VTDNav.PARENT);
    }
}

Also see this post on Faster XPaths with VTD-XML.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.