Below is my Java code. I'm trying to parse html files for links only. I send in the files, and printed the array under parseURL worked fine. But when I return it and set it as parray, suddenly parray is all null. Any ideas why?
public String[] getWebPages(Document doc) throws IOException
{
Elements pages = doc.select("a[href]").not("a[href$=gz]").not("a[href$=jar").not("a[href$=rar").not(
"a[href$=zip").not("a[href$=mdb").not("a[href$=doc").not("a[href$=docx").not("a[href$=odt").not(
"a[href$=pdf").not("a[href$=ppt").not("a[href$=pptx").not("a[href$=wks");
for (Element page : pages)
{
System.out.println("\nDownloading next page...");
String url = page.absUrl("href");
System.out.println(url);
parray = parseURL(url,page);
System.out.println(parray[0]);
System.out.println(parray[2]);
System.out.println(parray[3]);
System.out.println(parray[4]);
System.out.println(parray[5]);
System.out.println(parray[6]);
System.out.println(parray[7]);
System.out.println(parray[8]);
System.out.println(parray[9]);
}
return parray;
}
public String[] parseURL(String url, Element page)
{
Boolean boo = true;
if (url.indexOf("#") != -1)
{
System.out.println("Non-page...discarding page.");
return null;
}
for(x=0; x<500; x++)
if(url.equals(array[x]))
{
return null;
}
array[i] = url;
i++;
System.out.println(array[1]);
System.out.println(array[2]);
System.out.println(array[3]);
System.out.println(array[4]);
System.out.println(array[5]);
System.out.println(array[6]);
System.out.println(array[7]);
System.out.println(array[8]);
System.out.println(array[9]);
return array;
}
parraywill be the result of the last call toparseURL(url,page);, if that'snull, you'll end up withnull.for(x=0; x<500; x++)nice magic number.. why 500?array.lengthinstead of 500.