0

I'm new to java and servlet and currently trying to parse XML using Jericho XML Parser. For instance, i want to get links from each link tag, but it dose not show anything,and total number says 27(can get only correct total number without string). Anyone who knows how to, please teach me.

import java.io.IOException;
import java.io.PrintWriter;
import javax.servlet.ServletException;
import javax.servlet.annotation.WebServlet;
import javax.servlet.http.HttpServlet;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;
import java.net.MalformedURLException;
import java.net.URL;
import java.util.*;

import net.htmlparser.jericho.Element;
import net.htmlparser.jericho.Source;

@WebServlet(urlPatterns = { "/HelloServlet"})

public class HelloServlet extends HttpServlet {
private static final long serialVersionUID = 1L;

@Override
protected void doGet(HttpServletRequest req, HttpServletResponse resp) throws ServletException, IOException,MalformedURLException{

    resp.setContentType("text/html; charset=UTF-8");
    PrintWriter out = resp.getWriter();
    out.println("<html>");
    out.println("<head><meta http-equiv='content-type' content='text/html; charset=UTF-8'></head>");
    out.println("<body>");
    Source source = new Source(new URL("http://news.yahoo.com/rss/"));
    source.fullSequentialParse();


    List<Element> Linklist = source.getAllElements("link");


    if(Linklist!=null){
        out.println("<p>total:"+Linklist.size()+"</p>");
        for(Element link: Linklist){
            out.println("<p>"+link.getContent().toString()+"</p>");
        }
    }


    out.println("</body>");
    out.println("</html>");
}


}
1
  • Welcome to SO. Please read How to Ask. You haven't really provided enough details, such as a sample of the content of the Yahoo RSS feed, the output of your program, and what you're expecting to see. Please edit your question to include this information. Commented Nov 21, 2011 at 17:13

1 Answer 1

1

According to the Jericho HTML Parser homepage Jericho is for manipulating HTML documents. But the RSS from Yahoo is XML and you can use Java's standard XML to parse this document and to extract the link tags. Here is an example:

import java.io.IOException;
import java.net.URL;
import java.util.LinkedList;
import java.util.List;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.NodeList;
import org.xml.sax.SAXException;

// ...

private List<String> getRssLinks() throws ParserConfigurationException,
    SAXException, IOException 
{
  final List<String> rssLinks = new LinkedList<String>();
  final URL url = new URL("http://news.yahoo.com/rss/");
  final Document doc = DocumentBuilderFactory.newInstance().newDocumentBuilder()
                       .parse(url.openStream());
  final NodeList linkNodes = doc.getElementsByTagName("link");
  for(int i = 0; i < linkNodes.getLength(); i++) {
    final Element linkElement = (Element) linkNodes.item(i);
    rssLinks.add(linkElement.getTextContent());
  }

  return rssLinks;
}
Sign up to request clarification or add additional context in comments.

4 Comments

Thank for comment, vanje.but what is classes to import before these things?i tried to find with google but couldn't.sorry I'm pretty new on Java.
Added the import statements. The standard Java classes are very well documented, e.g. for Java 6 docs.oracle.com/javase/6/docs/api A sophisticated IDE like Eclipse can help you in finding the right packages for classes (in Eclipse context menu: Source / Organize Imports or position the cursor in a line of an unknown class and press Ctrl-1).
I can open the link i web. But, when I am accessing it through JAVA application. I am getting error(Exception in thread "main" java.net.ConnectException: Connection timed out: connect) for the url you provides
Are you behind a corporate firewall? Then you should call Java with proxy settings.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.