1

it's my first day with java and I try to build a little xml parser for my websites, so I can have a clean look on my sitemaps.xml . The code I use is like that

import java.io.IOException;
import java.io.InputStream;
import java.io.StringReader;
import java.net.URL;
import java.util.List;


import org.jdom2.Element;
import org.jdom2.JDOMException;
import org.jdom2.input.SAXBuilder;

class downloadxml {
   public static void main(String[] args) throws IOException {

       String str = "http://www.someurl.info/sitemap.xml";
       URL url = new URL(str);
       InputStream is = url.openStream();
       int ptr = 0;
       StringBuilder builder = new StringBuilder();
       while ((ptr = is.read()) != -1) {
           builder.append((char) ptr);
       }
       String xml = builder.toString();

       org.jdom2.input.SAXBuilder saxBuilder = new SAXBuilder();
       try {
           org.jdom2.Document doc = saxBuilder.build(new StringReader(xml));
           System.out.println(xml);
           Element xmlfile = doc.getRootElement();
           System.out.println("ROOT -->"+xmlfile);
           List list = xmlfile.getChildren("url");
           System.out.println("LIST -->"+list);
       } catch (JDOMException e) {
           // handle JDOMExceptio n
       } catch (IOException e) {
           // handle IOException
       }

       System.out.println("===========================");

   }
}

When the code pass

System.out.println(xml);

I get a clean print of the xml sitemap. When it comes to:

System.out.println("ROOT -->"+xmlfile);

Output:

ROOT -->[Element: <urlset [Namespace: http://www.sitemaps.org/schemas/sitemap/0.9]/>]

It also finds the root element. But for some reason or another, when the script should go for the childs, it return an empty print:

System.out.println("LIST -->"+list);

Output:

LIST -->[]

What should I do in another way? Any pointers to get the childs?

The XML looks like this

<?xml version="1.0" encoding="UTF-8"?>
          <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
            xmlns:image="http://www.google.com/schemas/sitemap-image/1.1">
               <url>
                   <loc>http://www.image.url</loc>
                   <image:image>
                     <image:loc>http://www.image.url/image.jpg</image:loc>
                   </image:image>
                   <changefreq>daily</changefreq>
                 </url>
                <url>
            </urlset>

2 Answers 2

2

You've come a long way in a day.

Short answer, you are ignoring the namespace of your XML Document. Change the line:

List list = xmlfile.getChildren("url");

to

Namespace ns = Namespace.getNamespace("http://www.sitemaps.org/schemas/sitemap/0.9");
List list = xmlfile.getChildren("url", ns);

For your convenience, you may also want to simplify the whole build process to:

org.jdom2.Document doc = saxBuilder.build("http://www.someurl.info/sitemap.xml");
Sign up to request clarification or add additional context in comments.

1 Comment

You're welcome, and you should probably read up more on namespaces, and how to handle them in JDOM, jdom.org/docs/faq.html#a0260
1

My comment is similar to the above, but with the catch clauses, that display nice messages when the input xml is not "well-formed". The input here is an xml file.

File file = new File("adr781.xml");
SAXBuilder builder = new SAXBuilder(false);
    try {
        Document doc = builder.build(file);
        Element root = doc.getRootElement();
    } catch (JDOMException e) {
        say(file.getName() + " is not well-formed.");
        say(e.getMessage());
    } catch (IOException e) {
        say("Could not check " + file.getAbsolutePath());
        say(" because " + e.getMessage());
    }

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.