0

Below are the xml file

<maindata>
        <publication-reference>
          <document-id document-id-type="docdb">
            <country>US</country>
            <doc-number>9820394ASD</doc-number>
            <date>20111101</date>
          </document-id>
          <document-id document-id-type="docmain">
            <doc-number>9820394</doc-number>
            <date>20111101</date>
          </document-id>
        </publication-reference>
</maindata>

i want to extract the <doc-number>tag value under the type = "docmain" below is my java code, while executed its extract 9829394ASD instead of 9820394

public static void main(String[] args) {
        String filePath ="D:/bs.xml";
        File xmlFile = new File(filePath);
        DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
        DocumentBuilder dBuilder;
        try {
            dBuilder = dbFactory.newDocumentBuilder();
            Document doc = dBuilder.parse(xmlFile);
            doc.getDocumentElement().normalize();
            System.out.println("Root element :" + doc.getDocumentElement().getNodeName());
            NodeList nodeList = doc.getElementsByTagName("publication-reference");
            List<Biblio> docList = new ArrayList<Biblio>();
            for (int i = 0; i < nodeList.getLength(); i++) {
                docList.add(getdoc(nodeList.item(i)));
            }

        } catch (SAXException | ParserConfigurationException | IOException e1) {
            e1.printStackTrace();
        }
    }
    private static Biblio getdoc(Node node) {
           Biblio bib = new Biblio();
        if (node.getNodeType() == Node.ELEMENT_NODE) {
            Element element = (Element) node;
            bib.setCountry(getTagValue("country",element));
            bib.setDocnumber(getTagValue("doc-number",element));
            bib.setDate(getTagValue("date",element));          
        }
        return bib;
    }

let me know how can we check the Type its docmain or doctype, should extract only if the type is docmain else should leave the element

added the getTagValue method

private static String getTagValue(String tag, Element element) {
        NodeList nodeList = element.getElementsByTagName(tag).item(0).getChildNodes();
        Node node = (Node) nodeList.item(0);
        return node.getNodeValue();
    }
1
  • Besides your problem, because you're trying to unmarshal a xml to a class, if you're using eclipse there is a tool called eclipselink moxy eclipse.org/eclipselink which is perfect for this kind of operations. It is way more straight forward, and I use this quite a lot. Commented Jul 18, 2016 at 10:22

3 Answers 3

1

The value could be retrieved with following XPath using the DOM and XPath API.

    DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
    DocumentBuilder builder = factory.newDocumentBuilder();
    Document doc = builder.parse(new File(...) );
    XPathFactory xPathfactory = XPathFactory.newInstance();
    XPath xpath = xPathfactory.newXPath();
    XPathExpression expr = xpath.compile("//document-id[@document-id-type=\"docmain\"]/doc-number/text()");
    String value = expr.evaluate(doc);
Sign up to request clarification or add additional context in comments.

Comments

1

Change your method getdoc() so that it create only a Biblio object for 'docmain` types.

private static Biblio getdoc(Node node) {
  Biblio bib = null;
  if (node.getNodeType() == Node.ELEMENT_NODE) {
    Element element = (Element) node;
    String type = element.getAttribute("document-id-type");
    if(type != null && type.equals("docmain")) {
      bib = new Biblio();
      bib.setCountry(getTagValue("country",element));
      bib.setDocnumber(getTagValue("doc-number",element));
      bib.setDate(getTagValue("date",element));          
    }
  }
  return bib;
}

Then in your main method you should only add to the list, if getdoc() result is not null:

for (int i = 0; i < nodeList.getLength(); i++) {
  Biblio biblio = getdoc(nodeList.item(i));
  if(biblio != null) {
    docList.add(biblio);
  }
}

Update: Ok, this is horrible, sorry. You should really learn a little bit about XPath. I try to rewrite this using XPath expressions.

First we need four XPath expressions. One to extract a node list with all document-id elements with type docmain.

The XPath expression for this is: /maindata/publication-reference/document-id[@document-id-type='docmain'] (whole XML document in context).

Here the predicate in [] ensures, that only document-id elements with type docmain are extracted.

Then for each field in a document-id element (with document-id element as context):

  • country: country
  • docnumber: doc-number
  • date: date

We use a static initializer for that:

private static XPathExpression xpathDocId;
private static XPathExpression xpathCountry;
private static XPathExpression xpathDocnumber;
private static XPathExpression xpathDate;

static {
  try {
    XPath xpath = XPathFactory.newInstance().newXPath();
    // Context is the whole document. Find all document-id elements with type docmain
    xpathDocId = xpath.compile("/maindata/publication-reference/document-id[@document-id-type='docmain']");

    // Context is a document-id element. 
    xpathCountry = xpath.compile("country");
    xpathDocnumber = xpath.compile("doc-number");
    xpathDate = xpath.compile("date");
  } catch (XPathExpressionException e) {
    e.printStackTrace();
  }
}

Then we rewrite the method getdoc. This method now gets a document-id element as input and creates a Biblio instance out of it using XPath expressions:

private static Biblio getdoc(Node element) throws XPathExpressionException {
  Biblio biblio = new Biblio();
  biblio.setCountry((String) xpathCountry.evaluate(element, XPathConstants.STRING));
  biblio.setDocnumber((String) xpathDocnumber.evaluate(element, XPathConstants.STRING));
  biblio.setDate((String) xpathDate.evaluate(element, XPathConstants.STRING));
  return biblio;
}

Then in the main() method you use the XPath expression to extract only the needed elements:

  NodeList nodeList = (NodeList) xpathDocId.evaluate(doc, XPathConstants.NODESET);
  List<Biblio> docList = new ArrayList<Biblio>();
  for (int i = 0; i < nodeList.getLength(); i++) {
    docList.add(getdoc(nodeList.item(i)));
  }

3 Comments

Thanks, but in the Getdoc() method always getting empty value (string Type), so my output collection are empty
Can you provide the code for the method getTagValue()?
added the getTagValue() method in the question body
0

thanks for the Help, following are the code

String Number = xPath.compile("//publication-reference//document-id[@document-id-type=\"docmain\"]/doc-number").evaluate(xmlDocument);

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.