0

I try to parse unknown xml structure using DOM and get success but now I try to use STAX event or stream parser because of large xml file.Though I do this using SAX and I get success.But now I am little bit curious on STAX.Now I really want to learn about it.

I do some research on that and write this code

This is for STAX streaming

public static void main(String args[]) throws XMLStreamException, FileNotFoundException {
    XMLInputFactory xf = XMLInputFactory.newInstance();

    XMLStreamReader xsr = xf.createXMLStreamReader(new InputStreamReader(new FileInputStream("c:\\file.xml")));
    XMLInputFactoryImpl x = new XMLInputFactoryImpl();
    while (xsr.hasNext()) {

        int e = xsr.next();

        if (e == XMLStreamConstants.START_ELEMENT) {
            System.out.println("Element Start Name:" + xsr.getLocalName());
        }
        if (e == XMLStreamReader.END_ELEMENT) {
            System.out.println("Element End Name:" + xsr.getLocalName());
        }
        if (e == XMLStreamConstants.CHARACTERS) {
            System.out.println("Element Text:" + xsr.getText());
        }
    }
}

And STAX Event driven

   public static void main(String[] args) throws XMLStreamException, FileNotFoundException {
        // TODO code application logic here
        // TODO Auto-generated method stub

        XMLInputFactory xif = XMLInputFactory.newInstance();
        XMLEventReader xer = xif.createXMLEventReader(new InputStreamReader(new FileInputStream("c:\\file.xml")));

        while (xer.hasNext()) {

            XMLEvent e = xer.nextEvent();
            if (e.isCharacters()) {
                System.out.println("Element Text : "+e.asCharacters().getData());
            }
            if (e.isStartElement()) {
                System.out.println("Start Element : "+e.asStartElement().getName());
            }
            if (e.isEndElement()) {
                System.out.println("End Element : "+e.asEndElement().getName());
            }
        }
    }

}

In above two code Parent node also print the blank text but it should not because in xml child node only contains text and it should only print the child node text. for example

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<student id="1">
  <fname>TestFirstName</fname>
  <lname>TestLastName</lname>
  <sectionname rollno="1">A</sectionname>
</student>

It should print TestFirstName,TestLastName etc means it should not return true this lines if (e == XMLStreamConstants.CHARACTERS) or if (e.isCharacters()) for parent nodes to print characters. So how can I modify my code to parse any level of xml file it may be on any depth or any cascading level.

2
  • Can you show what it prints now and what you would like to print instead ? Commented Apr 27, 2015 at 11:27
  • Actually I am not only concentrate to print I want to store in key value pair for only all child object.So my return statement contains fname=TestFirstName,lname=TestLastName,section=A Commented Apr 27, 2015 at 18:03

2 Answers 2

1

The event parsing sequence is correct, you have calls to empty characters because there is the pretty-print formatting (spaces or tabs). If your XML were in-lined (flat) you would not have these additional events.

From StAX documentation you can see that "ignorable whitespace and significant whitespace are also reported as Character events." : you just need to get rid of the whitespaces. Do do so you can add test !e.asCharacters().isWhiteSpace():

XMLEvent e = xer.nextEvent();
if (e.isCharacters() && !e.asCharacters().isWhiteSpace()) {
    System.out.println("Element Text : "+e.asCharacters().getData());
}

That should filter out the blank spaces and you will have only your expected events.

Sign up to request clarification or add additional context in comments.

2 Comments

Yeap thanks for your help, I follow your suggestion but my problem is not only to remove whitespace because if I remove whitespace but child element contains space (or blank) then it cannot put the child element into the key of hashmap.I want to parse a xml and only child elements will be returned with value(if no value present then it returns "" as a value of this child element).We may also use multimap for same name keys.
It looks like you are defining your logic based on if (e.isCharacters() ... block. You should take action when you end an element, during the e.isEndElement() block. Handling a bit of context you will know if you just ended an element with empty content or an element with text content. In both cases you are able to create child element with either text or empty value.
1

This is my solution using STAX Stream

public static void main(String[] args) throws FileNotFoundException, XMLStreamException {
    XMLInputFactory xf=XMLInputFactory.newInstance();
    XMLStreamReader xsr=xf.createXMLStreamReader(new InputStreamReader(new FileInputStream("c:\\test.xml")));
    String startElement = null;
    String endElement  =null;
    String elementTxt = null;
    while (xsr.hasNext()) {
        int e = xsr.next();
        if(e==XMLStreamConstants.START_ELEMENT){
            //System.out.println("StartElement Name :" + xsr.getLocalName());
            startElement = xsr.getLocalName();
        }
        if(e==XMLStreamConstants.END_ELEMENT){
            //System.out.println("EndElement Name :" + xsr.getLocalName());
            endElement = xsr.getLocalName();
            if(startElement.equalsIgnoreCase(endElement))
            System.out.println(" ElementName : "+ startElement + " ElementText : " + elementTxt);
        }
        if(e==XMLStreamConstants.CHARACTERS){
            //System.out.println("Element TextValue :" + xsr.getText());
            elementTxt = (xsr.getText().contains("\n")) ? "" : xsr.getText();
        }

    }
}

This is my solution using STAX Event

public static void main(String[] args) throws XMLStreamException,FileNotFoundException {
    // TODO code application logic here
    // TODO Auto-generated method stub

    XMLInputFactory xif = XMLInputFactory.newInstance();
    XMLEventReader xer = xif.createXMLEventReader(new InputStreamReader(new FileInputStream("c:\\test.xml")));
    String startElement = null;
    String endElement = null;
    String elementTxt = null;
    while (xer.hasNext()) {

        XMLEvent e = xer.nextEvent();
        if (e.isCharacters()) {
            elementTxt = (e.asCharacters().getData().contains("\n")) ? "": e.asCharacters().getData();
        }
        if (e.isStartElement()) {
            // System.out.println("Start Element : "+e.asStartElement().getName());
            startElement = e.asStartElement().getName().toString();
        }
        if (e.isEndElement()) {
            // System.out.println("End Element : "+e.asEndElement().getName());
            endElement = e.asEndElement().getName().toString();
            if (startElement.equalsIgnoreCase(endElement))
                System.out.println(" ElementName : " + startElement + " ElementText : " + elementTxt);
        }
    }
}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.