1

I am trying to parse a document using Dom4J. This document comes from various providers, and sometimes comes with namespaces and sometimes without.

For eg:

<book>
   <author>john</author>
   <publisher>
     <name>John Q</name>
   </publisher>
</book>

or

<book xmlns="http://schemas.xml.com/XMLSchemaInstance">
   <author>john</author>
   <publisher>
     <name>John Q</name>
   </publisher>
</book>

or

<book xmlns:i="http://schemas.xml.com/XMLSchemaInstance">
   <i:author>john</i:author>
   <i:publisher>
     <i:name>John Q</i:name>
   </i:publisher>
</book>

I have a list of XPaths. I parse the document into a Document class, and then search on it using the xpaths.

        Document doc = parseDocument(documentFile);
        List<String> XmlPaths = new List<String>();
        XmlPaths.add("book/author");
        XmlPaths.add("book/publisher/name");

        for (int i = 0; i < XmlPaths.size(); i++)
        {
            String searchPath = XmlPaths.get(i);

            Node currentNode = doc.selectSingleNode(searchPath);
            assert(currentNode != null);
        }

This code does not work on the last document, the one that is using namespace prefixes.

I tried these techniques, but none of them seem to work.

1) changing the last element in the xpath to be namespace neutral:

/book/:author
/book/[local-name()='author']
/[local-name()='book']/[local-name()='author']

All of these throw an exception saying that the XPATH format is not correct.

2) Adding namespace uris to the XPAth, after creating it using DocumentHelper.createXPath();

Any idea what I am doing wrong?

FYI I am using dom4j version 1.5

1 Answer 1

2

Your XPath does not contain a tag name. The general syntax in your case would be

/TAGNAMEPARENT[CONDITION_PARENT]/TAGNAMECHILD[CONDITION_CHILD]

The important aspect is that the tag names are mandatory while the conditions are optional. If you do not want to specify a tag name you have use * for "any tag". There may be performance implications for large XML files since you will always have to iterate over a node set instead of using an index lookup. Maybe @MichaelKay can comment on this.

Try this instead:

/*[local-name()='book']/*[local-name()='author']
Sign up to request clarification or add additional context in comments.

1 Comment

Thank you. That did it. Although I dont understand why I need to put a *. Isnt local-name() an alias for the tag?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.