XML read same tag with different segment

Question

Below are the xml file

<maindata>
        <publication-reference>
          <document-id document-id-type="docdb">
            <country>US</country>
            <doc-number>9820394ASD</doc-number>
            <date>20111101</date>
          </document-id>
          <document-id document-id-type="docmain">
            <doc-number>9820394</doc-number>
            <date>20111101</date>
          </document-id>
        </publication-reference>
</maindata>

i want to extract the <doc-number>tag value under the type = "docmain" below is my java code, while executed its extract 9829394ASD instead of 9820394

public static void main(String[] args) {
        String filePath ="D:/bs.xml";
        File xmlFile = new File(filePath);
        DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
        DocumentBuilder dBuilder;
        try {
            dBuilder = dbFactory.newDocumentBuilder();
            Document doc = dBuilder.parse(xmlFile);
            doc.getDocumentElement().normalize();
            System.out.println("Root element :" + doc.getDocumentElement().getNodeName());
            NodeList nodeList = doc.getElementsByTagName("publication-reference");
            List<Biblio> docList = new ArrayList<Biblio>();
            for (int i = 0; i < nodeList.getLength(); i++) {
                docList.add(getdoc(nodeList.item(i)));
            }

        } catch (SAXException | ParserConfigurationException | IOException e1) {
            e1.printStackTrace();
        }
    }
    private static Biblio getdoc(Node node) {
           Biblio bib = new Biblio();
        if (node.getNodeType() == Node.ELEMENT_NODE) {
            Element element = (Element) node;
            bib.setCountry(getTagValue("country",element));
            bib.setDocnumber(getTagValue("doc-number",element));
            bib.setDate(getTagValue("date",element));          
        }
        return bib;
    }

let me know how can we check the Type its docmain or doctype, should extract only if the type is docmain else should leave the element

added the getTagValue method

private static String getTagValue(String tag, Element element) {
        NodeList nodeList = element.getElementsByTagName(tag).item(0).getChildNodes();
        Node node = (Node) nodeList.item(0);
        return node.getNodeValue();
    }

Besides your problem, because you're trying to unmarshal a xml to a class, if you're using eclipse there is a tool called eclipselink moxy eclipse.org/eclipselink which is perfect for this kind of operations. It is way more straight forward, and I use this quite a lot. — Mad Matts
– Mad Matts, Commented Jul 18, 2016 at 10:22

Community · Accepted Answer · 2017-05-23 11:58:55Z

1

The value could be retrieved with following XPath using the DOM and XPath API.

    DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
    DocumentBuilder builder = factory.newDocumentBuilder();
    Document doc = builder.parse(new File(...) );
    XPathFactory xPathfactory = XPathFactory.newInstance();
    XPath xpath = xPathfactory.newXPath();
    XPathExpression expr = xpath.compile("//document-id[@document-id-type=\"docmain\"]/doc-number/text()");
    String value = expr.evaluate(doc);

edited May 23, 2017 at 11:58

CommunityBot

11 silver badge

answered Jul 19, 2016 at 8:50

Michal

2,4331 gold badge18 silver badges21 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

vanje · Accepted Answer · 2016-07-19 11:14:03Z

Change your method getdoc() so that it create only a Biblio object for 'docmain` types.

private static Biblio getdoc(Node node) {
  Biblio bib = null;
  if (node.getNodeType() == Node.ELEMENT_NODE) {
    Element element = (Element) node;
    String type = element.getAttribute("document-id-type");
    if(type != null && type.equals("docmain")) {
      bib = new Biblio();
      bib.setCountry(getTagValue("country",element));
      bib.setDocnumber(getTagValue("doc-number",element));
      bib.setDate(getTagValue("date",element));          
    }
  }
  return bib;
}

Then in your main method you should only add to the list, if getdoc() result is not null:

for (int i = 0; i < nodeList.getLength(); i++) {
  Biblio biblio = getdoc(nodeList.item(i));
  if(biblio != null) {
    docList.add(biblio);
  }
}

Update: Ok, this is horrible, sorry. You should really learn a little bit about XPath. I try to rewrite this using XPath expressions.

First we need four XPath expressions. One to extract a node list with all document-id elements with type docmain.

The XPath expression for this is: /maindata/publication-reference/document-id[@document-id-type='docmain'] (whole XML document in context).

Here the predicate in [] ensures, that only document-id elements with type docmain are extracted.

Then for each field in a document-id element (with document-id element as context):

country: country
docnumber: doc-number
date: date

We use a static initializer for that:

private static XPathExpression xpathDocId;
private static XPathExpression xpathCountry;
private static XPathExpression xpathDocnumber;
private static XPathExpression xpathDate;

static {
  try {
    XPath xpath = XPathFactory.newInstance().newXPath();
    // Context is the whole document. Find all document-id elements with type docmain
    xpathDocId = xpath.compile("/maindata/publication-reference/document-id[@document-id-type='docmain']");

    // Context is a document-id element. 
    xpathCountry = xpath.compile("country");
    xpathDocnumber = xpath.compile("doc-number");
    xpathDate = xpath.compile("date");
  } catch (XPathExpressionException e) {
    e.printStackTrace();
  }
}

Then we rewrite the method getdoc. This method now gets a document-id element as input and creates a Biblio instance out of it using XPath expressions:

private static Biblio getdoc(Node element) throws XPathExpressionException {
  Biblio biblio = new Biblio();
  biblio.setCountry((String) xpathCountry.evaluate(element, XPathConstants.STRING));
  biblio.setDocnumber((String) xpathDocnumber.evaluate(element, XPathConstants.STRING));
  biblio.setDate((String) xpathDate.evaluate(element, XPathConstants.STRING));
  return biblio;
}

Then in the main() method you use the XPath expression to extract only the needed elements:

  NodeList nodeList = (NodeList) xpathDocId.evaluate(doc, XPathConstants.NODESET);
  List<Biblio> docList = new ArrayList<Biblio>();
  for (int i = 0; i < nodeList.getLength(); i++) {
    docList.add(getdoc(nodeList.item(i)));
  }

Thanks, but in the Getdoc() method always getting empty value (string Type), so my output collection are empty

Prabu · Accepted Answer · 2016-07-20 10:11:35Z

0

thanks for the Help, following are the code

String Number = xPath.compile("//publication-reference//document-id[@document-id-type=\"docmain\"]/doc-number").evaluate(xmlDocument);

answered Jul 20, 2016 at 10:11

Prabu

3,7789 gold badges54 silver badges92 bronze badges

Collectives™ on Stack Overflow

XML read same tag with different segment

3 Answers 3

Comments

3 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related