6

I build large lists of high-level objects while parsing a tree. However, after this step, I have to remove duplicates from the list and I found this new step very slow in Python 2 (it was acceptable but still a little slow in Python 3). However I know that distinct objects actually have a distinct id. For this reason, I managed to get a much faster code by following these steps:

  • append all objects to a list while parsing;
  • sort the list with key=id option;
  • iterate over the sorted list and remove an element if the previous one has the same id.

Thus I have a working code which now runs smoothly, but I wonder whether I can achieve this task more directly in Python.

Example. Let's build two identical objects having the same value but a different id (for instance I will take a fractions.Fraction in order to rely on the standard library):

from fractions import Fraction
a = Fraction(1,3)
b = Fraction(1,3)

Now, if I try to achieve what I want to do by using the pythonical list(set(...)), I get the wrong result as {a,b} keeps only one of both values (which are identical but have a different id).

My question now is: what is the most pythonical, reliable, short and fast way for removing duplicates by id rather than duplicates by value? Order of the list doesn't matter if it has to be changed.

2
  • 1
    figured it out! deleted my comments. Commented Nov 19, 2016 at 8:50
  • Fraction is a bad example, because if you're trying to do this with a standard library class you can't. You can't change the rules set uses, you'd have a build a set-like class of your own (perhaps a thin wrapper around a dictionary keyed on ID). However, if you're actually using a class you control, you can implement __eq__ as shown below. Commented Nov 19, 2016 at 8:52

2 Answers 2

5

Be, careful because discrimating by id may fail with some basic types where python optimizes storage when possible:

a = "foo"
b = "foo"
print(a is b)

yields

True

Anyway, if you want to handle standard objects (even non-hashable ones) you can store them in a dictionary with they id as key.

Example with fractions:

from fractions import Fraction
a = Fraction(1,3)
b = Fraction(1,3)

d = dict()

d[id(a)] = a
d[id(b)] = b

print(d.values())

result:

dict_values([Fraction(1, 3), Fraction(1, 3)])
Sign up to request clarification or add additional context in comments.

Comments

4

You should override the __eq__ method so that it depends on the objects id rather than its values. But note that your objects must be hashable as well, so you should define a proper __hash__ method too.

class My_obj:
    def __init__(self, val):
        self.val = val

    def __hash__(self):
        return hash(self.val)

    def __eq__(self, arg):
        return id(self) == id(arg)

    def __repr__(self):
        return str(self.val)

Demo:

a = My_obj(5)
b = My_obj(5)

print({a, b})
{5, 5}

4 Comments

This is a very convenient answer as I use my own class; adding __hash__ and __eq__ should be very easy; in this specific case, can I make __hash__ return the id as a quick way of implementing this method (since my little class is a private one and only exists during the execution of a function and actually is even defined in the body of this function)?
@ThomasBaruchel Why you want to do that? the hash function is the safest way to go for creating hash values for objects. Because it guarantees to produce random (in python 3) and different values for different variables.
I was asking that because __hash__ needs to return an integer, and in this step my class (which describes a node in a tree) doesn not embed many usable values but rather lists of other nodes; for that reason I was wondering which significant value I could hash. I tried to make __hash__ return the id and it works well (still faster than my previous attempt), now with list(set(...)). Is it the way to go?
@ThomasBaruchel It will work because you don't need the hash value and it'll depend on the __eq__ method, so as far as you use id for comparison there is no problem.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.