I'm having some trouble with a csv datasource which contains some duplicate IDs. The final result however should only have the ID once. Therefore it was decided that we should only take the first instance we see and ignore any other instances.
Currently my code is abit like this:
id_list = list()
for item in datasource:
if item[0] not in id_list:
#process
id_list.append(item[0])
The problem is that when the list grows, performance drops. I'm wondering if there are more efficient ways of tracking the already processed IDs?