0

I have the following method which takes an ArrayList of strings where each string is a coordinate in the form of "(x, y)". The method is supposed to count the number of times any of these coordinates appears more than once in the list.

Here is my code:

public static int duplicateHouses(ArrayList<String> houses){
        int duplicateCount = 0;

        for(int i = 0; i < houses.size(); i++){
            for(int j = i + 1; j < houses.size(); j++){
                if((houses.get(i)).equals(houses.get(j))){
                    duplicateCount++;
                }
            }
        }

        return duplicateCount;
    }

It ends up returning a number which is far greater than the number of strings in my list. Where am I going wrong?

6
  • 1
    @MadProgrammer but j always begins at i + 1 Commented Dec 3, 2015 at 23:08
  • Code looks fine. Could you paste in the minimal test data that demonstrates the bug? Commented Dec 3, 2015 at 23:08
  • @CornOnTheKob My bad, missed that Commented Dec 3, 2015 at 23:09
  • @mk. If I test the code with about 10 coordinates and a couple of duplicates then it returns the correct result. However my program is reading a file of over 8,000 coordinates and it returns a number of over 17,000. (I know this is an extremely inefficient way of doing this, but I'd first like to understand what is going wrong with such a seemingly straightforward algorithm before I find a better solution) Commented Dec 3, 2015 at 23:19
  • 1
    If you have at least 4 duplicates in your List, the first loop will find 3, the second loop will find 2, the third loop will find 1, which gives a result of 6. Basically, each loop is finding the same duplicates over again. Commented Dec 3, 2015 at 23:20

2 Answers 2

1

If you have at least 4 duplicates in your List, the first loop will find 3, the second loop will find 2, the third loop will find 1, which gives a result of 6. Basically, each loop is finding the same duplicates over again.

For example...

public static void main(String[] args) {
    ArrayList<String> houses = new ArrayList<>(25);
    houses.add("(1x1)");
    houses.add("(1x2)");
    houses.add("(1x1)");
    houses.add("(1x3)");
    houses.add("(1x1)");
    houses.add("(1x4)");
    houses.add("(1x1)");
    houses.add("(1x5)");

    System.out.println(houses.size());
    System.out.println(duplicateHouses2(houses));
}

public static int duplicateHouses(ArrayList<String> houses) {
    int duplicateCount = 0;

    for (int i = 0; i < houses.size(); i++) {
        System.out.println("---");
        for (int j = i + 1; j < houses.size(); j++) {
            if ((houses.get(i)).equals(houses.get(j))) {
                System.out.println(i + ": " + houses.get(i) + " == " + j + ": " + houses.get(j));
                duplicateCount++;
            }
        }
    }

    return duplicateCount;
}

Which outputs...

---
0: (1x1) == 2: (1x1)
0: (1x1) == 4: (1x1)
0: (1x1) == 6: (1x1)
---
---
2: (1x1) == 4: (1x1)
2: (1x1) == 6: (1x1)
---
---
4: (1x1) == 6: (1x1)
---
---
---

Now, you could, create a copy of the List and remove each duplicate as you find it or you could use a second List to store the duplicate values.

I tried calculating the difference between a Set of the values and the original List, but this returned a value which was 1 less then expected result (in the above example it returned 3 instead of 4)

Instead, I used a Stream#filter of the original and a Set to generate the duplicate count

For example...

public static int duplicateHouses(ArrayList<String> houses) {
    // Make sure we only have 1 of each possible value
    Set<String> copy = new HashSet<>(houses);
    int duplicateCount = 0;
    // For each value, we want to filter the original 
    // list so that only matching values remain...
    for (String value : copy) {
        Stream<String> filter = houses.stream().filter((String t) -> t.equals(value));
        // If there is more then one, then there are duplicates...
        long count = filter.count();
        if (count > 1) {
            duplicateCount += count;
        }
    }
    return duplicateCount;
}

Which, given the first example, returns 3

Sign up to request clarification or add additional context in comments.

Comments

1

This is because you are looping 2 times, so each entry will tell you that I am duplicate 2 times.

Lets say you have same coordinate/house at 2, 5 and 10. Now as per existing logic, when your first loop is running for i=2 then it will give you YES for 5 and 10 and your duplicateCount will be 2, which will be is correct. But when your first loop will run for i=5 then again it will give you YES for 10. And that's where you will get problem.

So, what you can do is increment duplicateCount only once for your first FOR loop so that even though there are 100 more entries then also it would not increment duplicateCount and it would increment only on successive run of first FOR loop, which will prevent duplicate increment of duplicateCount.

Try below:

public static int duplicateHouses(ArrayList<String> houses){
    int duplicateCount = 0;
    ArrayList<String> dupHouses = new ArrayList<String>;

    for(int i = 0; i < houses.size(); i++){
        for(int j = i + 1; j < houses.size(); j++){
            if((houses.get(i)).equals(houses.get(j))){
                if(!dupHouses.contains(houses.get(j))){
                    duplicateCount++;
                    dupHouses.add(houses.get(j));
                }
            }
        }
        dupHouses = new ArrayList<String>; //Reset for next iteration ...
    }
    return duplicateCount;
}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.