4

I have a list like this:

[['john', 14, 'USA'],['john', 27, 'USA'],['paul', 17, 'USA'],['paul', 36, 'USA']]

And need to get as output:

[['john', 27, 'USA'],['paul', 36, 'USA']]

This means to remove duplicates based on position 0 but keep the ones with the higher value in position 1.

I know how to remove duplicates on regular lists using set(), but how do I go about applying those 2 conditions? I was thinking something with a for but i might be very slow since the real lists I'll use are very large.

I already tried to remove duplicates just by names but I'm puzzled about keeping the one with the higher value.

Thanks!

1
  • This is a very specific requirement, there isn't going to be a ready-made solution, you're going to have to loop through things. Commented Jan 7, 2015 at 21:34

4 Answers 4

2

You can use itertools.groupby for grouping your elements by first index and max function with a proper key to select the max based on second element :

>>> from itertools import groupby
>>> l=[['john', 14, 'USA'], ['john', 27, 'USA'], ['paul', 17, 'USA'], ['paul', 36, 'USA']]
>>> [max(g ,key=lambda x:x[1]) for _,g in groupby(sorted(l),lambda x: x[0])]
[['john', 27, 'USA'], ['paul', 36, 'USA']]

Or as a more efficient way you can use operators.itemgetter() instead lambda :

>>> from operators import itemgetter
>>> [max(g ,key=itemgetter(1)) for _,g in groupby(sorted(l),itemgetter(0))]
[['john', 27, 'USA'], ['paul', 36, 'USA']]
Sign up to request clarification or add additional context in comments.

Comments

1

I like Kasra's solution, but jsut to give another way to do it:

from collections import defaultdict

l=[['john', 14, 'USA'], ['john', 27, 'USA'], ['paul', 17, 'USA'], ['paul', 36, 'USA']]
key=defaultdict(list)
for n,a,c in l:
    key[(n,c)].append(a)
f_list = [[k[0],max(la),k[1]] for k,la in key.iteritems()]

Comments

0

trying my hand at incomprehensible level pythonic.

using list and dictionary comprehensions i sort, merge, and reformat

a = [['john', 14, 'USA'],['john', 27, 'USA'],['paul', 17, 'USA'],['paul', 36, 'USA']]

b = sorted(a, key=lambda x: x[0])
c = { x[0] : x[1:len(x)] for x in b }

result = [[n] + c[n] for n in c]

Comments

0

You can use an OrderedDict and replace the value if we find a sublist with the same name with a larger second subelement:

l = [['john', 14, 'USA'],['john', 27, 'USA'],['paul', 17, 'USA'],['paul', 36, 'USA']]

from collections import OrderedDict
d = OrderedDict()

for sub in l:
    name = sub[0]
    if name in d:
        if sub[1] > d[name][1]:
            d[name] = sub
    else:
        d[name] = sub
print(list(d.values()))

[['john', 27, 'USA'], ['paul', 36, 'USA']]

This is O(n) as it does not have to sort the list which is n log n so this will scale better than any method using sorted.

If order does not matter a normal dict will be fine:

d = {}
for sub in l:
    name = sub[0]
    if name in d:
        if sub[1] > d[name][1]:
            d[name] = sub
    else:
        d[name] = sub
print(d.values())

If you were going to sort using operator.itemgetter would be more efficient:

from operator import  itemgetter    
sorted(l,key=itemgetter(1))

If you wanted to sort the original list:

l.sort(key=itemgetter(1))

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.