1

Below is my Java code. I'm trying to parse html files for links only. I send in the files, and printed the array under parseURL worked fine. But when I return it and set it as parray, suddenly parray is all null. Any ideas why?

public String[] getWebPages(Document doc) throws IOException
{

    Elements pages = doc.select("a[href]").not("a[href$=gz]").not("a[href$=jar").not("a[href$=rar").not(
            "a[href$=zip").not("a[href$=mdb").not("a[href$=doc").not("a[href$=docx").not("a[href$=odt").not(
                    "a[href$=pdf").not("a[href$=ppt").not("a[href$=pptx").not("a[href$=wks");

    for (Element page : pages) 
    {
        System.out.println("\nDownloading next page...");
        String url = page.absUrl("href");
        System.out.println(url);
        parray = parseURL(url,page);

           System.out.println(parray[0]);
           System.out.println(parray[2]);
           System.out.println(parray[3]);
           System.out.println(parray[4]);
           System.out.println(parray[5]);
           System.out.println(parray[6]);
           System.out.println(parray[7]);
           System.out.println(parray[8]);
           System.out.println(parray[9]);


    }

    return parray;


   }



 public String[] parseURL(String url, Element page)
    {

     Boolean boo = true;

        if (url.indexOf("#") != -1)
            {
                System.out.println("Non-page...discarding page.");
                return null;
            }

        for(x=0; x<500; x++)
        if(url.equals(array[x]))
        {
            return null;
        }

        array[i] = url;
           i++;

           System.out.println(array[1]);
           System.out.println(array[2]);
           System.out.println(array[3]);
           System.out.println(array[4]);
           System.out.println(array[5]);
           System.out.println(array[6]);
           System.out.println(array[7]);
           System.out.println(array[8]);
           System.out.println(array[9]);



        return array;
    }
7
  • 1
    After the loop, parray will be the result of the last call to parseURL(url,page);, if that's null, you'll end up with null. Commented Feb 25, 2014 at 1:49
  • for(x=0; x<500; x++) nice magic number.. why 500? Commented Feb 25, 2014 at 1:50
  • This kinda seems better suited to CodeReview Commented Feb 25, 2014 at 1:51
  • 500 was an insane maximum I chose. Like I said below, most code is just checking for whether or not it is null. It'll all be deleted eventually. Commented Feb 25, 2014 at 1:56
  • 2
    You could at least use array.length instead of 500. Commented Feb 25, 2014 at 2:00

1 Answer 1

3

It's because you're using a for loop, and inside the loop, you get new data, and discard all data obtained from the previous iteration of the loop, so most of the data is discarded and wasted.

Consider creating either a 2D array, or probably better, a List of List, List<List<String>>, so you can hold all the results returned in the for loop.

i.e.,

List<String> parseUrl(...) {

}

and then,

List<List<String>> parsedInfo = new ArrayList<List<String>>();
while (stillHavePages) {
  // parse pages and add to list above
}
Sign up to request clarification or add additional context in comments.

9 Comments

Or just use array instead of parray and ignore parseURLs return value (which will be either array or null). This is very weird code.
But shouldn't it stay constant inside the for-loop? At least for one iteration? Parray should be valid for one whole set I would think. Maybe not. I'll rework it.
And yes, it is insanely weird code. A good bit is just checking for null stuff or not so that's why it's all jumbled.
@user3010468: I have no idea. All I know is that you're using magic numbers in a for loop that shouldn't have magic numbers. Just say no to them.
@user3010468 You are returning null from parseURL() if the URL is already in array or doesn't contain a "#". Your parray ends up being set to null in that case, regardless of what's in array. You're confusing yourself because you have hard coded numbers, strange mixes of locals vs. fields, and unnecessary reassignments all over the place.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.