Getting values from repeating child nodes using xpath

Question

i am building up a java application to extract the values inside the table tags using xpath.

Please suggest me an efficient way to get all 200 values from the page. my code works perfectly fine for the 100 rows withing the 1st DataTable. However, i have no way to get to the 2nd dataTable.

i am able to extract them using the following java class.

the expected output

http://a.com/   data for a  526735  Z
http://b.com/   data for b  522273  Z
.
.
.
.

http://c.com/   data for c  578335  Z  
http://d.com/   data for d  513445  Z

<table>
<tbody>
 <tr>
 <td style="padding-right>
 <table class = dataTabe>
  <tbody>
   <tr>
    <td><a HREF="http://a.com/" target="_parent">data for a</a></td>
    <td class="numericalColumn">526735</td>
    <td class="numericalColumn">Z</td></tr>
   <tr>
    <td><a HREF="http://b.com/" target="_parent">data for b</a></td>
    <td class="numericalColumn">522273</td>
    <td class="numericalColumn">B</td></tr>
.
.
.100 <tr> here
.
  </tbody>
 </table>
</td>
<td style="padding-right>
 <table class = dataTabe>
  <tbody>
   <tr>
   <td><a HREF="http://c.com/" target="_parent">data for c</a></td>
   <td class="numericalColumn">526735</td>
   <td class="numericalColumn">Z</td></tr>
  <tr>
   <td><a HREF="http://d.com/" target="_parent">data for d</a></td>
   <td class="numericalColumn">522273</td>
   <td class="numericalColumn">B</td></tr>
.
.
.100 rows here
.
  </tbody>
 </table>      
</td>
</tr>
</tbody>
</table>

This is the class used to get the data.

import java.io.BufferedReader;
import java.io.InputStream;
import org.w3c.tidy.*;
import java.io.InputStreamReader;
import java.net.URL;
import java.net.URLConnection;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.Iterator;
import java.util.Map;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathExpression;
import javax.xml.xpath.XPathFactory;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.NodeList;
import org.w3c.tidy.Node;
import org.w3c.tidy.Tidy;
import org.w3c.tidy.Tidy;

public class CompaniesGetter {
public static void main(String[] args) throws Exception{
    String name,link,scripcode,group,s,key;
    int a=1;
    int count=1;
    URL oracle = new URL("http://money.rediff.com/companies");
    URLConnection yc = oracle.openConnection();
    InputStream is = yc.getInputStream();
    is = oracle.openStream();
    Tidy tidy = new Tidy();
    tidy.setQuiet(true);
    tidy.setShowWarnings(false);
    Document tidyDOM = tidy.parseDOM(is, null);
    XPathFactory xPathFactory = XPathFactory.newInstance();
    XPath xPath = xPathFactory.newXPath();
    Map<String,String> mLink=new HashMap<String,String>();
    Map<String,String> mCode=new HashMap<String,String>();
    Map<String,String> mGroup=new HashMap<String,String>();
    ArrayList<String> aName=new ArrayList<String>();
    //for(int j=0;j<2;j++)
    for(int i =1;i<=200;i++)
    {if(i==100)
    {
        a=2;
        s=attrib[1];
    }
        link = "//table[@class='dataTable']/tbody/tr["+i+"]/td/a/@href";
        name = "//table[@class='dataTable']/tbody/tr["+i+"]/td/a";
        scripcode = "//table[@class='dataTable']/tbody/tr["+i+"]/td[2]";
        group = "//table[@class='dataTable']/tbody/tr["+i+"]/td[3]";
        String linkValue = (String)xPath.evaluate(link, tidyDOM, XPathConstants.STRING);
        String nameValue = (String)xPath.evaluate(name, tidyDOM, XPathConstants.STRING);
        String scripValue = (String)xPath.evaluate(scripcode, tidyDOM, XPathConstants.STRING);
        String groupValue = (String)xPath.evaluate(group, tidyDOM, XPathConstants.STRING);
        aName.add(nameValue);
        mLink.put(nameValue, linkValue);
        mCode.put(nameValue, scripValue);
        mGroup.put(nameValue,groupValue);
    }
    Iterator<String> itr=aName.iterator();
    while (itr.hasNext()){
        key=itr.next();
        System.out.println("::"+(count++)+" "+key + "  "+mLink.get(key)+"   "+mCode.get(key)+"   "+mGroup.get(key)+" ::");
    }

}

}

kisp · Accepted Answer · 2011-08-16 14:52:27Z

1

Hm. Just a tip: Do you use the variable "a" in the XPaths?

link = "//table[@class='dataTable']/tbody/tr["+i+"]/td/a/@href";

should be

link = "//table[@class='dataTable'][" + a + "]/tbody/tr["+i+"]/td/a/@href";

answered Aug 16, 2011 at 14:52

kisp

6,5723 gold badges23 silver badges19 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Himanshu Soni Over a year ago

duh!! it didnt struck me. thanks a lot. and what do you say about the code. can i optimize it in some way

kisp Over a year ago

Actually yes. I think you should use NodeLists instead of manually paging one by one on the list. And the reasons are : 1. Here, in every cycle your XPaths would be evaulated on the DOM. 2. The count of the table rows may be differ. ( Maybe the number of the input rows will raise dynamically )

Himanshu Soni Over a year ago

I tried using NodeLists in the 1st place, but being new to xpath and jaxp, everything is simply going above the head. it would be helpful if you could elaborate your solution.

Himanshu Soni Over a year ago

The count of the table rows is constant. But the main problem lies in selecting a row and getting d childs values and repeating the procedure for n number of rows.

Collectives™ on Stack Overflow

Getting values from repeating child nodes using xpath

1 Answer 1

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related