0

I am parsing a custom XML configuration file in a Java application. I am trying to use the SAX parser, mainly because I need to report errors in the configuration with line numbers.

There are a lot of code samples online of implementing a handler class, and things seem fairly straightforward for normal processing - for example, http://tutorials.jenkov.com/java-xml/sax-example.html

But in my case, sometimes I need to skip an entire tree under an element:

<sampledocument>
    <sampletag>
         <process/>
         <these/>
         <tags/>
    </sampletag>
    <sampletag skip="yes">
         <do_not>
         <process/>
         <these/>
         <tags/>
    </sampletag>
<sampledocument>

LATER ADDITION: Moreover, I only know whether to skip at runtime. In a somewhat contrived example, I would need to open a file to process the tags under <sampletag>, and if the file is not found, not process them:

<sampledocument>
    <sampletag file="file1">
         <process/>
         <these/>
         <tags/>
         <if_file1_exists/>
    </sampletag>
    <sampletag file="file2">
         <process/>
         <these/>
         <tags/>
         <if_file2_exists/>
    </sampletag>
<sampledocument>

Of course, I can just track skipping in the handler code, but this is a bit awkward. Can I somehow tell SAX in the startElement() method to just skip the contents of this element?

1 Answer 1

2

Write a filter class to sit on the pipeline between the SAX parser and your existing ContentHandler. You can do this by extending XMLFilterImpl. This filter should have an integer variable skipDepth, initially zero.

In startElement, if you recognize an element that you want to deep-skip, or if skipDepth > 0, then increment skipDepth.

In endElement, if skipDepth > 0, decrement skipDepth.

In all event handlers, pass the event on down the pipeline (by calling super.xxx()) if and only if skipDepth == 0.

If you want to be smart, you can write this filter in a generic way, so it takes a parameter which is a callback function that accepts the node name and attributes and returns a boolean indicating whether to skip the element. Then you can reuse your code next time you want to skip elements, but with different skip conditions.

Sign up to request clarification or add additional context in comments.

5 Comments

Thanks! But how is this different from simply maintaining skipDepth in the ContentHandler? In my real task, the ContentHandler must actually process the element before determining whether to skip a tree, so if I have a separate filter, the ContentHandler wil have to trigger the skipping anyway.
SAX code is always best written as a pipeline, one step in the pipeline for each separable task. Otherwise you quickly end up with spaghetti code in your ContentHandler (you already said it was "a bit awkward"). With a properly constructed pipeline you end up with maintainable, reusable code that is easy to modify and debug; if you put everything in the ContentHandler you end up with an unmaintainable mess. Of course, if your example differs from the real task then I can't advise you how to break up the functionality in the real task.
I have modified the example to test for files at runtime. The real code verifies correctness of the configuration, explaining how it verifies it would make a very long question, but it is a call to a separate class - so somewhat similar to checking for a file.
Changing the question in such a way as to invalidate existing answers is just thoroughly unfriendly.
Sorry - I did get thoroughly confused. I have restored the previous example and placed the new one after it, seeing as providing the new one in a comment is impossible.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.