0

I'm trying to compare two lists in Python, checking if they are the same. The problem is that both lists can contain duplicate elements, and in order to be considered equal, they need to have the same amount of duplicate elements.

I've currently "solved" this by creating a copy of both lists, and removing an element from both lists if they are equal:

def equals(v1: Vertex, v2: Vertex) -> bool:
    # also checks if neighbourhoods are the same size
    if v1.label == v2.label:

        # copy the neighbourhoods to prevent data loss on removal of checked vertices
        v1_neighbours = v1.neighbours.copy()
        v2_neighbours = v2.neighbours.copy()

        # for every Vertex in v1.neighbours, check if there is a corresponding Vertex in v2.neighbours
        # if there is, remove that Vertex from both lists
        for n1 in v1_neighbours:
            for n2 in v2_neighbours:
                if n1.label == n2.label:
                    v1_neighbours.remove(n1)
                    v2_neighbours.remove(n2)
                    break
                else:
                    return False

        if len(v1_neighbours) == 0 and len(v2_neighbours) == 0:
            return True

    return False

I doubt this solution works: doesn't List.remove(element) remove all occurrences of that element? Also, I don't think it's memory efficient, which is important, as the neighborhoods will be pretty big.

Could anyone tell me how I can compare v1_neighbours and v2_neighbours properly, checking for an equal amount of duplicates while not altering the lists, without copying the lists?

9
  • 1
    Is the order of element important as well? Commented Feb 28, 2019 at 21:16
  • @PatrickArtner it is not, just the amount Commented Feb 28, 2019 at 21:16
  • This seems like a fundamentally broken approach to vertex equality. It seems like you should either be using the default identity-based equality, or a label equality check. Commented Feb 28, 2019 at 21:18
  • collections.Counter(l1) == collections.Counter(l2) should do the trick Commented Feb 28, 2019 at 21:18
  • 1
    @aws_apprentice: That runs into all the same problems as a non-hash-based comparison, with the additional problem of hash collision. Commented Feb 28, 2019 at 21:27

2 Answers 2

3

Count them and compare the Counter-dicts:

a= [ (x,y) for x in range(5) for y in range(5)]+[ (x,y) for x in range(3) for y in range(3)]
b= [ (x,y) for x in range(5) for y in range(5)]+[ (x,y) for x in range(3) for y in range(3)]
c= [ (x,y) for x in range(5) for y in range(5)]+[ (x,y) for x in range(4) for y in range(3)]

from collections import Counter

ca = Counter(a)
cb = Counter(b)
cc = Counter(c)

print(ca==cb)    # True
print(ca==cc)    # False
print(ca)

Output:

True
False

Counter({(0, 0): 2, (0, 1): 2, (0, 2): 2, (1, 0): 2, (1, 1): 2, (1, 2): 2, 
         (2, 0): 2, (2, 1): 2, (2, 2): 2, (0, 3): 1, (0, 4): 1, (1, 3): 1, 
         (1, 4): 1, (2, 3): 1, (2, 4): 1, (3, 0): 1, (3, 1): 1, (3, 2): 1, 
         (3, 3): 1, (3, 4): 1, (4, 0): 1, (4, 1): 1, (4, 2): 1, (4, 3): 1, 
         (4, 4): 1})
Sign up to request clarification or add additional context in comments.

6 Comments

This is okay as far as multiset comparison goes, but the multiset comparison seems misguided in the first place.
I had a mathematician look it over. She said it was mathematically correct, but she doesn't know programming, so I ended up writing the algorithm.
Eh, isn't this pretty resource intensive? The goal is not only to implement a working color refinement algorithm, but also make it as efficient as possible.
@HYBR1D That depends. It is a dictionary with each tuple as key and an integer as value. If you have 1e6 different tuples and 2 of each - you get a bigger dict, if you have 100 tuples and 1e6 of each it is very small. It is however fast - it takes one pass over your list to get your values... If you apply your loop you have O(n*n) and lots of other problems because you iterate over data and modify the same data you iterate over. Beside that you take copies of data you already got - so initially you need at least twice your memory.
@HYBR1D: If you want an efficient implementation, the code you linked doesn't look like it's implementing one. It looks like it's going to turn out quadratic time or worse. There are more efficient approaches, but I haven't found a resource that goes into a satisfying level of detail about how the refinement step is supposed to work (particularly whether certain updates should be performed "all at once" or one vertex at a time).
|
2

While collections.Counter would be the usual way to perform this kind of multiset comparison in Python, I think comparing neighbors is a fundamentally misguided approach to vertex equality testing. Vertex equality should use either the default identity-based equality, or label-based equality, depending on the details of your program.

You seem to be trying to implement a comparison where two vertices are equal if they have equal labels and equal collections of neighbors. However, if it's possible for two different vertices to have equal labels, then it should be possible for two distinct vertices to have the same label and the same neighbors, making this a broken equality comparison. If it's not possible for two vertices to have equal labels, then comparing neighbors is unnecessary.

Your neighbor comparison nested loop also assumes that vertices are equal if the have equal labels, further supporting a label-based comparison. If this assumption is wrong, then you have the problem of how to determine that neighbors are equal. If you try to compare neighbors with ==, you'll run into infinite recursion.


With the additional clarification that you're implementing a color refinement algorithm, we can confirm that comparing neighbors by label only is actually correct. However, equals seems like a misleading name for the function you're implementing, since you're not testing whether the given Vertex objects represent the same vertex.

4 Comments

The code sample above is only a part of the algorithm. What I'm making is a color refinement algorithm. The vertices are labeled after their neighborhood size. Here's the whole code: pastebin.com/X0USvNLP
About the 3rd part: I'm only comparing the labels and neighborhood sizes. The full algorithm uses recursion, so I don't need to check for neighbors of neighbors, that is handled later.
@HYBR1D: If this is color refinement, then comparing multisets of neighbor colors is correct, but equals seems like a misleading name for the function you're implementing.
Well, it does check if two vertices are equal for this use case. Any suggestions for a better name? I probably should have mentioned color refinement in the question, my bad.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.