Nested loop to count duplicates in an ArrayList isn't working properly

Question

I have the following method which takes an ArrayList of strings where each string is a coordinate in the form of "(x, y)". The method is supposed to count the number of times any of these coordinates appears more than once in the list.

Here is my code:

public static int duplicateHouses(ArrayList<String> houses){
        int duplicateCount = 0;

        for(int i = 0; i < houses.size(); i++){
            for(int j = i + 1; j < houses.size(); j++){
                if((houses.get(i)).equals(houses.get(j))){
                    duplicateCount++;
                }
            }
        }

        return duplicateCount;
    }

It ends up returning a number which is far greater than the number of strings in my list. Where am I going wrong?

Code looks fine. Could you paste in the minimal test data that demonstrates the bug? — mk.
– mk., Commented Dec 3, 2015 at 23:08
@mk. If I test the code with about 10 coordinates and a couple of duplicates then it returns the correct result. However my program is reading a file of over 8,000 coordinates and it returns a number of over 17,000. (I know this is an extremely inefficient way of doing this, but I'd first like to understand what is going wrong with such a seemingly straightforward algorithm before I find a better solution) — KOB
– KOB, Commented Dec 3, 2015 at 23:19
If you have at least 4 duplicates in your List, the first loop will find 3, the second loop will find 2, the third loop will find 1, which gives a result of 6. Basically, each loop is finding the same duplicates over again. — MadProgrammer
– MadProgrammer, Commented Dec 3, 2015 at 23:20

MadProgrammer · Accepted Answer · 2015-12-03 23:34:37Z

If you have at least 4 duplicates in your List, the first loop will find 3, the second loop will find 2, the third loop will find 1, which gives a result of 6. Basically, each loop is finding the same duplicates over again.

For example...

public static void main(String[] args) {
    ArrayList<String> houses = new ArrayList<>(25);
    houses.add("(1x1)");
    houses.add("(1x2)");
    houses.add("(1x1)");
    houses.add("(1x3)");
    houses.add("(1x1)");
    houses.add("(1x4)");
    houses.add("(1x1)");
    houses.add("(1x5)");

    System.out.println(houses.size());
    System.out.println(duplicateHouses2(houses));
}

public static int duplicateHouses(ArrayList<String> houses) {
    int duplicateCount = 0;

    for (int i = 0; i < houses.size(); i++) {
        System.out.println("---");
        for (int j = i + 1; j < houses.size(); j++) {
            if ((houses.get(i)).equals(houses.get(j))) {
                System.out.println(i + ": " + houses.get(i) + " == " + j + ": " + houses.get(j));
                duplicateCount++;
            }
        }
    }

    return duplicateCount;
}

Which outputs...

---
0: (1x1) == 2: (1x1)
0: (1x1) == 4: (1x1)
0: (1x1) == 6: (1x1)
---
---
2: (1x1) == 4: (1x1)
2: (1x1) == 6: (1x1)
---
---
4: (1x1) == 6: (1x1)
---
---
---

Now, you could, create a copy of the List and remove each duplicate as you find it or you could use a second List to store the duplicate values.

I tried calculating the difference between a Set of the values and the original List, but this returned a value which was 1 less then expected result (in the above example it returned 3 instead of 4)

Instead, I used a Stream#filter of the original and a Set to generate the duplicate count

For example...

public static int duplicateHouses(ArrayList<String> houses) {
    // Make sure we only have 1 of each possible value
    Set<String> copy = new HashSet<>(houses);
    int duplicateCount = 0;
    // For each value, we want to filter the original 
    // list so that only matching values remain...
    for (String value : copy) {
        Stream<String> filter = houses.stream().filter((String t) -> t.equals(value));
        // If there is more then one, then there are duplicates...
        long count = filter.count();
        if (count > 1) {
            duplicateCount += count;
        }
    }
    return duplicateCount;
}

Which, given the first example, returns 3

hagrawal7777 · Accepted Answer · 2015-12-03 23:30:13Z

This is because you are looping 2 times, so each entry will tell you that I am duplicate 2 times.

Lets say you have same coordinate/house at 2, 5 and 10. Now as per existing logic, when your first loop is running for i=2 then it will give you YES for 5 and 10 and your duplicateCount will be 2, which will be is correct. But when your first loop will run for i=5 then again it will give you YES for 10. And that's where you will get problem.

So, what you can do is increment duplicateCount only once for your first FOR loop so that even though there are 100 more entries then also it would not increment duplicateCount and it would increment only on successive run of first FOR loop, which will prevent duplicate increment of duplicateCount.

Try below:

public static int duplicateHouses(ArrayList<String> houses){
    int duplicateCount = 0;
    ArrayList<String> dupHouses = new ArrayList<String>;

    for(int i = 0; i < houses.size(); i++){
        for(int j = i + 1; j < houses.size(); j++){
            if((houses.get(i)).equals(houses.get(j))){
                if(!dupHouses.contains(houses.get(j))){
                    duplicateCount++;
                    dupHouses.add(houses.get(j));
                }
            }
        }
        dupHouses = new ArrayList<String>; //Reset for next iteration ...
    }
    return duplicateCount;
}

Collectives™ on Stack Overflow

Nested loop to count duplicates in an ArrayList isn't working properly

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related