Keep a list to prevent duplicates efficiency in Python

Question

I'm having some trouble with a csv datasource which contains some duplicate IDs. The final result however should only have the ID once. Therefore it was decided that we should only take the first instance we see and ignore any other instances.

Currently my code is abit like this:

id_list = list()
for item in datasource: 
    if item[0] not in id_list:
        #process
        id_list.append(item[0])

The problem is that when the list grows, performance drops. I'm wondering if there are more efficient ways of tracking the already processed IDs?

joojaa · Accepted Answer · 2014-05-20 07:24:51Z

5

Use a set object, sets are guaranteed to not have duplicates and provide fast membership testing. You can use set like this

id_list = set()
for item in datasource: 
    if item[0] not in id_list:
        # process
        id_list.add(item[0])

This will be better, since the lookup in set objects will happen in constant time, as opposed to the linear time lookup in lists.

edited May 20, 2014 at 7:24

joojaa

4,4321 gold badge29 silver badges47 bronze badges

answered May 20, 2014 at 7:06

thefourtheye

241k53 gold badges466 silver badges505 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Community · Accepted Answer · 2017-05-23 10:26:18Z

1

With reference to this question, I would suggest to use a dict.

Especially, the case of unique keys seems appropriate.

You could then try something like:

if key not in dict:
    [insert values in dict]

edited May 23, 2017 at 10:26

CommunityBot

11 silver badge

answered May 20, 2014 at 7:16

Stefan

2,5181 gold badge19 silver badges33 bronze badges

2 Comments

thefourtheye Over a year ago

But the problem with this approach is, we cannot maintain the Order, as it is. We may have to use OrderedDict

Stefan Over a year ago

I wasn't aware of the need for ordered results, because you already noted duplicate results...

d3dave · Accepted Answer · 2014-05-20 07:08:27Z

0

Instead of using a list you can use a binary search tree ordered by the ID.

answered May 20, 2014 at 7:08

d3dave

1,39114 silver badges23 bronze badges

Collectives™ on Stack Overflow

Keep a list to prevent duplicates efficiency in Python

3 Answers 3

Comments

2 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related