2

I have a simple task:

I'd like to read an XML-files and return it as completely as possible. With the following code there are two remaining problems:

  1. Comments are removed
  2. I have no access to the XML-Declaration

Java Code:

package com.stackoverflow.tests;

import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;
import org.xml.sax.Attributes;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.DefaultHandler;

public class XmlParsing {

  public static void main(String[] args) {

    StringBuffer b = new StringBuffer();

    try {

      SAXParserFactory factory = SAXParserFactory.newInstance();
      SAXParser saxParser = factory.newSAXParser();

      DefaultHandler handler = new DefaultHandler() {

        @Override
        public void startElement(String uri, String localName, String qName, Attributes attributes)
            throws SAXException {

          b.append("<" + qName + attributesToString(attributes) + ">");
        } // END: startElement()



        @Override
        public void endElement(String uri, String localName, String qName)
            throws SAXException {

          b.append("</" + qName + ">");
        } // END: endElement



        @Override
        public void characters(char ch[], int start, int length)
            throws SAXException {

          b.append(new String(ch, start, length));

        } // END: characters()



      }; // END: DefaultHandler

      saxParser.parse("./src/main/ressources/XmlTest/validWithAttributesCommentsInlineElements.xml", handler);

      System.out.println(b.toString());

    } catch (Exception e) {
      e.printStackTrace();

    } // END: try

  } // END: main



  public static String attributesToString(Attributes a) {
    StringBuffer sb = new StringBuffer();
    for(int i = 0; i < a.getLength(); i++) {
      sb
        .append(" ")
        .append(a.getQName(i))
        .append("=\"")
        .append(a.getValue(i))
        .append("\"");
    }
    return sb.toString();
  }



} // END: Class XmlParsing

I parse the follwoing XML-file...:

<?xml version="1.0" encoding="UTF-8"?>
<A attr="1" aaa="2">
    <F>general</F>
    <B test="3">
        <C>element 1</C>
        <C>element 2</C>
        <C>element 3</C>
    </B>
    <D>general</D>
    <E>general</E>

    <inline-element/>
    <inline-element with="attributes"/>

    <!-- Comment -->

    <inline-element />
    <inline-element with="attributes" />

</A>

And get:

<A attr="1" aaa="2">
    <F>general</F>
    <B test="3">
        <C>element 1</C>
        <C>element 2</C>
        <C>element 3</C>
    </B>
    <D>general</D>
    <E>general</E>

    <inline-element></inline-element>
    <inline-element with="attributes"></inline-element>



    <inline-element></inline-element>
    <inline-element with="attributes"></inline-element>

</A>

It's fine for me that an <elem /> becomes <elem></elem>, but I'd really like to have access to the XML-declaration and the comments.

1 Answer 1

1

For to get access to an event when a comment is seen, you need to use a Lexcial Handler. See https://docs.oracle.com/javase/tutorial/jaxp/sax/events.html

// Implement a handler
LexialHandler handler = new LexicalHandler() {
    @Override
    public void comment(char[] ch, int start, int length) throws SAXException {
    // ...   
    }
}

// Use the handler

SAXParser saxParser = factory.newSAXParser();
XMLReader xmlReader = saxParser.getXMLReader();
xmlReader.setProperty("http://xml.org/sax/properties/lexical-handler",
                      handler); 
Sign up to request clarification or add additional context in comments.

2 Comments

How can I use the handler before implementing it? I don't get it up and running.
That worked. Do you also have an idea of how to obtain the <?xml version="1.0" encoding="UTF-8"?>?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.