2

I want to get some values from a news site with SAXParser. But its' structure is hard to me, I am new at XML and SAX.

Issue: News Site using SAME TAG NAME for site name and news title for its XML.

When I run Java Code It is working without error but problem is about outputs.

How can I only get <item> tag's child tag: <title> ? I don't want to show site title on my application. It is big issue for me.

XML Side

<channel>

   <title>Site Name</title>

   <item>  
       <title>News Title!</title>       
   </item>

</channel>

Java Side

There is no error in java file :)

try {

            SAXParserFactory factory = SAXParserFactory.newInstance();
            SAXParser saxParser = factory.newSAXParser();

            DefaultHandler handler = new DefaultHandler() {

                boolean newsTitle   = false;


                public void startElement(String uri, String localName,
                        String qName, Attributes attributes)
                        throws SAXException {

                    //System.out.println("Start Element :" + qName);

                    if (qName.equalsIgnoreCase("title")) {
                        newsTitle = true;
                    }

                }

                public void endElement(String uri, String localName,
                        String qName) throws SAXException {

                    //System.out.println("End Element :" + qName);

                }

                public void characters(char ch[], int start, int length)
                        throws SAXException {

                    if (newsTitle) {
                        System.out.println("Title : "
                                + new String(ch, start, length));
                        newsTitle = false;
                    }

                }

            };

            saxParser
                    .parse("C:\\ntv.xml",handler);

        }
        catch (Exception e) {
            e.printStackTrace();
        }

OUTPUT:

Title : Site Name

Title : News Title

2 Answers 2

1

You can use XPath rather than parsing your XML using SAX.

XPath expression for your case is:

/channel/item/title

Example code:

import org.xml.sax.InputSource;

import javax.xml.xpath.XPathExpressionException;
import javax.xml.xpath.XPathFactory;
import java.io.StringReader;

public class XPathTest {

    public static void main(String[] args) throws XPathExpressionException {

        String xml = "<channel>\n" +
                "\n" +
                "   <title>Site Name</title>\n" +
                "\n" +
                "   <item>  \n" +
                "       <title>News Title!</title>       \n" +
                "   </item>\n" +
                "\n" +
                "</channel>";

        Object result = XPathFactory.newInstance().newXPath().compile("/channel/item/title").evaluate(new InputSource(new StringReader(xml)));
        System.out.print(result);
    }
}
Sign up to request clarification or add additional context in comments.

2 Comments

Is there a way for SAX, similar to this approach?
SAX and XPath are fundamentally different. With SAX you'll need to maintain state yourself. If speed and memory is not much concern then you should definitely go for XPath. And if your XML's are not big and less in volume even then XPath is quite suitable. I've used XPath in applications with 1 GB heap that processes 1K documents per second each sizing few 10s of KB.
1

You can add a stack to your DefaultHandler. When you find a tag in startElement push the tag onto the stack, then in endElement pop the topmost tag off the stack. When you want to know where you are in the document, check if the stack contains /title/item/title or just /title.

Use the localName instead of the qName if you don't care about namespaces. The qName may have a namespace prepended to it.

Also the way you're using the characters method is not correct (which is a common problem), see the explanation in the SAX tutorial.

2 Comments

I have not enough idea about stack or other data structure things in detailed :) Is there a another for more simple way?
@hakiko: you can keep track of whether you're inside of an item with a boolean instance variable that you set to true in startElement if the current tag is 'item', and set to false in endElement if you find the end of an item. then you know which title you have according to how the flag is set.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.