How to check if two lists are equal with duplicates?

Question

I'm trying to compare two lists in Python, checking if they are the same. The problem is that both lists can contain duplicate elements, and in order to be considered equal, they need to have the same amount of duplicate elements.

I've currently "solved" this by creating a copy of both lists, and removing an element from both lists if they are equal:

def equals(v1: Vertex, v2: Vertex) -> bool:
    # also checks if neighbourhoods are the same size
    if v1.label == v2.label:

        # copy the neighbourhoods to prevent data loss on removal of checked vertices
        v1_neighbours = v1.neighbours.copy()
        v2_neighbours = v2.neighbours.copy()

        # for every Vertex in v1.neighbours, check if there is a corresponding Vertex in v2.neighbours
        # if there is, remove that Vertex from both lists
        for n1 in v1_neighbours:
            for n2 in v2_neighbours:
                if n1.label == n2.label:
                    v1_neighbours.remove(n1)
                    v2_neighbours.remove(n2)
                    break
                else:
                    return False

        if len(v1_neighbours) == 0 and len(v2_neighbours) == 0:
            return True

    return False

I doubt this solution works: doesn't List.remove(element) remove all occurrences of that element? Also, I don't think it's memory efficient, which is important, as the neighborhoods will be pretty big.

Could anyone tell me how I can compare v1_neighbours and v2_neighbours properly, checking for an equal amount of duplicates while not altering the lists, without copying the lists?

This seems like a fundamentally broken approach to vertex equality. It seems like you should either be using the default identity-based equality, or a label equality check. — user2357112
– user2357112, Commented Feb 28, 2019 at 21:18
collections.Counter(l1) == collections.Counter(l2) should do the trick — yeshks
– yeshks, Commented Feb 28, 2019 at 21:18
@aws_apprentice: That runs into all the same problems as a non-hash-based comparison, with the additional problem of hash collision. — user2357112
– user2357112, Commented Feb 28, 2019 at 21:27

Patrick Artner · Accepted Answer · 2019-02-28 21:21:29Z

3

Count them and compare the Counter-dicts:

a= [ (x,y) for x in range(5) for y in range(5)]+[ (x,y) for x in range(3) for y in range(3)]
b= [ (x,y) for x in range(5) for y in range(5)]+[ (x,y) for x in range(3) for y in range(3)]
c= [ (x,y) for x in range(5) for y in range(5)]+[ (x,y) for x in range(4) for y in range(3)]

from collections import Counter

ca = Counter(a)
cb = Counter(b)
cc = Counter(c)

print(ca==cb)    # True
print(ca==cc)    # False
print(ca)

Output:

True
False

Counter({(0, 0): 2, (0, 1): 2, (0, 2): 2, (1, 0): 2, (1, 1): 2, (1, 2): 2, 
         (2, 0): 2, (2, 1): 2, (2, 2): 2, (0, 3): 1, (0, 4): 1, (1, 3): 1, 
         (1, 4): 1, (2, 3): 1, (2, 4): 1, (3, 0): 1, (3, 1): 1, (3, 2): 1, 
         (3, 3): 1, (3, 4): 1, (4, 0): 1, (4, 1): 1, (4, 2): 1, (4, 3): 1, 
         (4, 4): 1})

edited Feb 28, 2019 at 21:21

answered Feb 28, 2019 at 21:19

Patrick Artner

51.9k10 gold badges50 silver badges79 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

user2357112 Over a year ago

This is okay as far as multiset comparison goes, but the multiset comparison seems misguided in the first place.

HYBR1D Over a year ago

I had a mathematician look it over. She said it was mathematically correct, but she doesn't know programming, so I ended up writing the algorithm.

HYBR1D Over a year ago

Eh, isn't this pretty resource intensive? The goal is not only to implement a working color refinement algorithm, but also make it as efficient as possible.

Patrick Artner Over a year ago

@HYBR1D That depends. It is a dictionary with each tuple as key and an integer as value. If you have 1e6 different tuples and 2 of each - you get a bigger dict, if you have 100 tuples and 1e6 of each it is very small. It is however fast - it takes one pass over your list to get your values... If you apply your loop you have O(n*n) and lots of other problems because you iterate over data and modify the same data you iterate over. Beside that you take copies of data you already got - so initially you need at least twice your memory.

user2357112 Over a year ago

@HYBR1D: If you want an efficient implementation, the code you linked doesn't look like it's implementing one. It looks like it's going to turn out quadratic time or worse. There are more efficient approaches, but I haven't found a resource that goes into a satisfying level of detail about how the refinement step is supposed to work (particularly whether certain updates should be performed "all at once" or one vertex at a time).

|

user2357112 · Accepted Answer · 2019-02-28 21:57:18Z

2

While collections.Counter would be the usual way to perform this kind of multiset comparison in Python, I think comparing neighbors is a fundamentally misguided approach to vertex equality testing. Vertex equality should use either the default identity-based equality, or label-based equality, depending on the details of your program.

You seem to be trying to implement a comparison where two vertices are equal if they have equal labels and equal collections of neighbors. However, if it's possible for two different vertices to have equal labels, then it should be possible for two distinct vertices to have the same label and the same neighbors, making this a broken equality comparison. If it's not possible for two vertices to have equal labels, then comparing neighbors is unnecessary.

Your neighbor comparison nested loop also assumes that vertices are equal if the have equal labels, further supporting a label-based comparison. If this assumption is wrong, then you have the problem of how to determine that neighbors are equal. If you try to compare neighbors with ==, you'll run into infinite recursion.

With the additional clarification that you're implementing a color refinement algorithm, we can confirm that comparing neighbors by label only is actually correct. However, equals seems like a misleading name for the function you're implementing, since you're not testing whether the given Vertex objects represent the same vertex.

edited Feb 28, 2019 at 21:57

answered Feb 28, 2019 at 21:25

user2357112

286k32 gold badges490 silver badges571 bronze badges

4 Comments

HYBR1D Over a year ago

The code sample above is only a part of the algorithm. What I'm making is a color refinement algorithm. The vertices are labeled after their neighborhood size. Here's the whole code: pastebin.com/X0USvNLP

HYBR1D Over a year ago

About the 3rd part: I'm only comparing the labels and neighborhood sizes. The full algorithm uses recursion, so I don't need to check for neighbors of neighbors, that is handled later.

user2357112 Over a year ago

@HYBR1D: If this is color refinement, then comparing multisets of neighbor colors is correct, but equals seems like a misleading name for the function you're implementing.

HYBR1D Over a year ago

Well, it does check if two vertices are equal for this use case. Any suggestions for a better name? I probably should have mentioned color refinement in the question, my bad.

Collectives™ on Stack Overflow

How to check if two lists are equal with duplicates?

2 Answers 2

6 Comments

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

6 Comments

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related