Delete duplicated of an object within a list

Question

I got a waveform object, define as following:

class wfm:
    """Class defining a waveform characterized by:
        - A name
        - An electrode configuration
        - An amplitude (mA)
        - A pulse width (microseconds)"""

    def __init__(self, name, config, amp, width=300):
        self.name = name
        self.config = config
        self.amp = amp
        self.width = width

    def __eq__(self, other):
        return type(other) is self.__class__ and other.name == self.name and other.config == self.config and other.amp == self.amp and other.width == self.width

    def __ne__(self, other):
        return not self.__eq__(other)

Through parsing, I get a list called waveforms with 770 instance of wfm in it. There is a lot of duplicate, and I need to delete them.

My idea was to get the ID of equivalent object, store the largest ID in a list, and then loop on all the waveforms from the end while popping out each duplicate.

Code:

duplicate_ID = []
for i in range(len(waveforms)):
    for j in range(i+1, len(waveforms)):
        if waveforms[i] == waveforms[j]:
            duplicate_ID.append(waveforms.index(waveforms[j]))
            print ('{} eq {}'.format(i, j))

duplicate_ID = list(set(duplicate_ID)) # If I don't do that; 17k IDs

Turns out (thx to the print) that I have duplicates that don't appear into the ID list, for instance 750 is a duplicate of 763 (print says it; test too) and yet none of this 2 IDs appears in my duplicate list.

I'm quite sure there is a better solution that this method (which doesn't yet work), and I would be glad to hear it. Thanks for the help!

EDIT: More complicated scenario

I've got a more complicated scenario. I got 2 classes, wfm (see above) and stim:

class stim:
    """Class defining the waveform used for a stimultion by:
        - Duration (milliseconds)
        - Frequence Hz)
        - Pattern of waveforms"""

    def __init__(self, dur, f, pattern):
        self.duration = dur
        self.fq = f
        self.pattern = pattern

    def __eq__(self, other):
        return type(other) is self.__class__ and other.duration == self.duration and other.fq == self.fq and other.pattern == self.pattern

    def __ne__(self, other):
        return not self.__eq__(other)

I parse my files to fill a dict: paradigm. It looks like that:

paradigm[file name STR] = (list of waveforms, list of stimulations)

# example:
paradigm('myfile.xml') = ([wfm1, ..., wfm10], [stim1, ..., stim5])

Once again, I want to delete the duplicates, i.e. I want to keep only the data where:

Waveforms are the same
And stim is the same

Example:

file1 has 10 waveforms and file2 has the same 10 waveforms.
file1 has stim1 and stim2 ; file2 has stim3, sitm 4 and stim 5.

stim1 and stim3 are the same; so since the waveforms are also the same, I want to keep:
file1: 10 waveforms and stim1 and stim2
file2: 10 waveforms and stim 4 and stim5

The correlation is kinda messy in my head, so I got a few difficulties finding the right storage solution for wave forms and stimulation in order to compare them easly. If you got any idea, I'd be glad to hear it. Thanks!

This is irrelevant to your question but I think you wanted to make width a default argument but placed the value 300 in a wrong place, right? — Georgy
– Georgy, Commented Feb 8, 2018 at 15:18
Possible duplicate of Remove duplicates in list of object with Python — Georgy
– Georgy, Commented Feb 8, 2018 at 15:19
It will be much easier if you use namedtuple instead of a class. WFM = namedtuple('WFM', ['name', 'config', 'amp', 'width']). Then calling set() on a list of instances of WFM will remove all the duplicates. — Georgy
– Georgy, Commented Feb 8, 2018 at 15:26
Yep thanks for the remark on width... Small mistake thou since it's constant most of the time. I did find some post looking quite similar, and yet I haven't solve my problem, that's why I posted this. Thanks for the answer Alex, I'll try it ! — Mathieu
– Mathieu, Commented Feb 8, 2018 at 15:45

Alex · Accepted Answer · 2018-02-08 15:48:18Z

1

Problem

The .index method uses the .__eq__ method you overloaded. So

waveforms.index(waveforms[j])

will always find the first instance of the waveform in the list containing the same attributes as waveforms[j].

w1 = wfm('a', {'test_param': 4}, 3, 2.0)
w2 = wfm('b', {'test_param': 4}, 3, 2.0)
w3 = wfm('a', {'test_param': 4}, 3, 2.0)

w1 == w3  # True
w2 == w3  # False

waveforms = [w1, w2, w3]
waveforms.index(waveforms[2]) == waveforms.index(waveforms[0]) == 0  # True

Solution

Immutable

You don't need to store list indices if you do this immutably:

key = lambda w: hash(str(vars(w)))
dupes = set()
unique = [dupes.add(key(w)) or w for w in waveforms if key(w) not in dupes]

unique == [w1, w2]  # True

Mutable

key = lambda w: hash(str(vars(w)))
seen = set()
idxs = [i if key(w) in seen else seen.add(key(w)) for i, w in enumerate(waveforms)]

for idx in filter(None, idxs[::-1]):
    waveforms.pop(idx)

waveforms == [w1, w2]  # True

Big O Analysis

It's a good habit to think about big O complexity when writing algorithms (though optimization should come at the cost of readability only when needed). In this case, these solutions are a bit more readable and also big O optimal.

Your initial solution was O(n^2) due to the double for loop.

Both solutions provided are O(n).

edited Feb 8, 2018 at 15:48

answered Feb 8, 2018 at 15:24

Alex

19.2k9 gold badges65 silver badges82 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Mathieu Over a year ago

From what you did, I got a quite similar problem that appeared; and I can't manage to solve it... Would you mind taking a look at the edit ?

Collectives™ on Stack Overflow

Delete duplicated of an object within a list

1 Answer 1

Problem

Solution

Immutable

Mutable

Big O Analysis

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Problem

Solution

Immutable

Mutable

Big O Analysis

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related