I have a list of lists that looks like:
c = [['470', '4189.0', 'asdfgw', 'fds'],
['470', '4189.0', 'qwer', 'fds'],
['470', '4189.0', 'qwer', 'dsfs fdv']
...]
c has about 30,000 interior lists. What I'd like to do is eliminate duplicates based on the 4th item on each interior list. So the list of lists above would look like:
c = [['470', '4189.0', 'asdfgw', 'fds'],['470', '4189.0', 'qwer', 'dsfs fdv'] ...]
Here is what I have so far:
d = [] #list that will contain condensed c
d.append(c[0]) #append first element, so I can compare lists
for bact in c: #c is my list of lists with 30,000 interior list
for items in d:
if bact[3] != items[3]:
d.append(bact)
I think this should work, but it just runs and runs. I let it run for 30 minutes, then killed it. I don't think the program should take so long, so I'm guessing there is something wrong with my logic.
I have a feeling that creating a whole new list of lists is pretty stupid. Any help would be much appreciated, and please feel free to nitpick as I am learning. Also please correct my vocabulary if it is incorrect.
setof the fourth elements already in the output? This would make the membership lookup much faster.addto the set as you go along, don't iterate overctwice!df.drop_duplicates('col_4')