1

So I've been trying to collect all of the nodes Names along with their contents in pre-order. So I used a recursive method to get all of the Nodes from the XML file along with the Text. Problem is whenever I execute it I keep on getting empty strings in the ArrayList. The empty Strings are next to Academy, Faculty and Department since they got no text.

I've tried deleting empty strings and null from the ArrayList but didnt work does anyone know a way to solve this problem and thanks!

Here is the XML File:

<?xml version="1.0"?>
<Academy>
    <Faculty>
        <Department name= "Science">
            <Director>Kay Jordan</Director>
            <Don>ABC</Don>
        </Department>
    </Faculty>
</Academy>

And here is the Java Code:

import java.io.File;
import java.io.IOException;
import java.util.ArrayList;
import java.util.Arrays;

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.transform.TransformerException;

import org.w3c.dom.Document;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import org.xml.sax.SAXException;

public class Gen2 {

    static ArrayList<String> SLDP = new ArrayList<String>(0);

    public static void main(String[] args) throws SAXException, IOException,
                ParserConfigurationException, TransformerException {

        DocumentBuilderFactory docBuilderFactory = DocumentBuilderFactory
            .newInstance();
        DocumentBuilder docBuilder = docBuilderFactory.newDocumentBuilder();
        Document document = docBuilder.parse(new File("Test.xml"));
        doSomething(document.getDocumentElement());

        System.out.print("< ");
        SLDP.removeAll(Arrays.asList(null," "));
        for(int z =0; z<SLDP.size();z++){
            System.out.print(SLDP.get(z).toString()+ " ");
        }
        System.out.print(" >");
    }

    public static void doSomething(Node node) {
        // do something with the current node instead of System.out
        //System.out.println(node.getNodeName());
        SLDP.add(node.getNodeName());
        System.out.println(node.getFirstChild().getTextContent());
        SLDP.add(node.getFirstChild().getTextContent());

        NodeList nodeList = node.getChildNodes();
        for (int i = 0; i < nodeList.getLength(); i++) {
            Node currentNode = nodeList.item(i);
            if (currentNode.getNodeType() == Node.ELEMENT_NODE) {
                //calls this method for all the children which is Element
                doSomething(currentNode);
            }
        }
    }
}
3
  • I ran your code and it works accordingly without changing a line. Commented Feb 25, 2015 at 22:55
  • @AbbéRésina Well it does work im not saying it dosnt, but the problem is if you check your output ull see a line space between Academy and Faculty..etc that is because there is string there in the ArrayList that is being printed when calling system.out.print.. The out put should be like that: < Academy Faculty Department Director Kay Jordan Don ABC > with 1 space between each... I hope you got what i mean Commented Feb 25, 2015 at 23:18
  • Ok, I miss-understood the problem. I have posted 2 options. Commented Feb 25, 2015 at 23:54

2 Answers 2

2

Simple way : in doSomething() trim the node name and content :

SLDP.add(node.getNodeName().trim());       
//System.out.print(node.getFirstChild().getTextContent());
SLDP.add(node.getFirstChild().getTextContent().trim());

Less simple : add this to the DocumentBuilderFactory :

docBuilderFactory.setIgnoringElementContentWhitespace(true);
docBuilderFactory.setValidating(true);
docBuilderFactory.setSchema(...);

But you will need the schema of the xml file for the parser to be able to validate and remove un-needed white spaces. See the documentation here.

Sign up to request clarification or add additional context in comments.

2 Comments

The problem is that i dont knw the schema
did couple of edits on the code and used the Trim() and it worked ! thanks alot
0

Each new line in XML is new TEXT_NODE in child list. So calling:

SLDP.add(node.getFirstChild().getTextContent());

will result in adding new line character to SLDP array list.

You can prevent this by properly configuring DocumentBuilderFactory:

docBuilderFactory.setIgnoringElementContentWhitespace(true);

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.