Remove duplicates from nested list based on a string and a value

Question

I have a list like this:

[['john', 14, 'USA'],['john', 27, 'USA'],['paul', 17, 'USA'],['paul', 36, 'USA']]

And need to get as output:

[['john', 27, 'USA'],['paul', 36, 'USA']]

This means to remove duplicates based on position 0 but keep the ones with the higher value in position 1.

I know how to remove duplicates on regular lists using set(), but how do I go about applying those 2 conditions? I was thinking something with a for but i might be very slow since the real lists I'll use are very large.

I already tried to remove duplicates just by names but I'm puzzled about keeping the one with the higher value.

Thanks!

This is a very specific requirement, there isn't going to be a ready-made solution, you're going to have to loop through things. — dursk
– dursk, Commented Jan 7, 2015 at 21:34

Kasravnd · Accepted Answer · 2015-01-08 07:49:04Z

2

You can use itertools.groupby for grouping your elements by first index and max function with a proper key to select the max based on second element :

>>> from itertools import groupby
>>> l=[['john', 14, 'USA'], ['john', 27, 'USA'], ['paul', 17, 'USA'], ['paul', 36, 'USA']]
>>> [max(g ,key=lambda x:x[1]) for _,g in groupby(sorted(l),lambda x: x[0])]
[['john', 27, 'USA'], ['paul', 36, 'USA']]

Or as a more efficient way you can use operators.itemgetter() instead lambda :

>>> from operators import itemgetter
>>> [max(g ,key=itemgetter(1)) for _,g in groupby(sorted(l),itemgetter(0))]
[['john', 27, 'USA'], ['paul', 36, 'USA']]

edited Jan 8, 2015 at 7:49

answered Jan 7, 2015 at 21:41

Kasravnd

108k19 gold badges167 silver badges195 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Prashant · Accepted Answer · 2015-01-07 21:58:35Z

1

I like Kasra's solution, but jsut to give another way to do it:

from collections import defaultdict

l=[['john', 14, 'USA'], ['john', 27, 'USA'], ['paul', 17, 'USA'], ['paul', 36, 'USA']]
key=defaultdict(list)
for n,a,c in l:
    key[(n,c)].append(a)
f_list = [[k[0],max(la),k[1]] for k,la in key.iteritems()]

answered Jan 7, 2015 at 21:58

Prashant

1,06412 silver badges29 bronze badges

Comments

David Chan · Accepted Answer · 2015-01-07 22:09:17Z

0

trying my hand at incomprehensible level pythonic.

using list and dictionary comprehensions i sort, merge, and reformat

a = [['john', 14, 'USA'],['john', 27, 'USA'],['paul', 17, 'USA'],['paul', 36, 'USA']]

b = sorted(a, key=lambda x: x[0])
c = { x[0] : x[1:len(x)] for x in b }

result = [[n] + c[n] for n in c]

answered Jan 7, 2015 at 22:09

David Chan

7,5222 gold badges30 silver badges50 bronze badges

Comments

Padraic Cunningham · Accepted Answer · 2015-01-07 23:34:40Z

You can use an OrderedDict and replace the value if we find a sublist with the same name with a larger second subelement:

l = [['john', 14, 'USA'],['john', 27, 'USA'],['paul', 17, 'USA'],['paul', 36, 'USA']]

from collections import OrderedDict
d = OrderedDict()

for sub in l:
    name = sub[0]
    if name in d:
        if sub[1] > d[name][1]:
            d[name] = sub
    else:
        d[name] = sub
print(list(d.values()))

[['john', 27, 'USA'], ['paul', 36, 'USA']]

This is O(n) as it does not have to sort the list which is n log n so this will scale better than any method using sorted.

If order does not matter a normal dict will be fine:

d = {}
for sub in l:
    name = sub[0]
    if name in d:
        if sub[1] > d[name][1]:
            d[name] = sub
    else:
        d[name] = sub
print(d.values())

If you were going to sort using operator.itemgetter would be more efficient:

from operator import  itemgetter    
sorted(l,key=itemgetter(1))

If you wanted to sort the original list:

l.sort(key=itemgetter(1))

Collectives™ on Stack Overflow

Remove duplicates from nested list based on a string and a value

4 Answers 4

Comments

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related