1

In python, I need to parse a list consisting of sublists. If the first elements of some sublists are the same, I need to pick the sublist with the least 4th element; but if the 4th elements are also the same, then I need select the sublist with higher 3rd element. For example, in the following list, I need to select sublists 1, 4 and 5.

alignments=[["A","B","10","4"],["A","C","15","8"],["A","E","20","10"],\
            ["D","C","15","3"],\
            ["G","U","1","9"],["G","O","10","9"]]

I achieved it with the code below which is very cumbersome:

best_alignments=[]
best_al=alignments[0]
k=0
c=0

counter_list=[]
for al in alignments[1:]:
    c+=1
    if best_al[0]==al[0]:
        if best_al[3]==al[3]:
            if best_al[2]<al[2]:
                best_al=al
                counter_list.append(c-1)
            else:
                counter_list.append(c)
        else:
            counter_list.append(c)
    else:
        if k==0:
            best_al=al
            k+=1
        else:
            best_al=al
for index in sorted(counter_list, reverse=True):
    del alignments[index]   
for el in alignments:
    print(el)

I am sure there a much easier way to do that. Any suggestions are appreciated.

3
  • 1
    The backslashes are not required inside brackets. Commented Feb 25, 2020 at 1:34
  • 1
    This might be more appropriate for Code Review Stack Exchange. Commented Feb 25, 2020 at 1:57
  • @AMC thanks, didn't know about that forum Commented Feb 25, 2020 at 13:51

2 Answers 2

1

Here's a method that essentially does two passes over the data. First, it groups the data by the first item. Then, it returns the maximum as defined by your criteria, the least of the third element, and the most of the fourth (assuming you meant the integer value of the string).

from collections import defaultdict

def foo(alignments):
    grouped = defaultdict(list)
    for al in alignments:
        grouped[al[0]].append(al)
    return [
        max(v, key=lambda al: (-int(al[2]),int(al[3])))
        for v in grouped.values()
    ]

Pretty sure this is O(N) space and time, so not terribly inefficient.

In an Ipython repl:

In [3]: from collections import defaultdict
    ...: def foo(alignments):
    ...:     grouped = defaultdict(list)
    ...:     for al in alignments:
    ...:         grouped[al[0]].append(al)
    ...:     return [
    ...:         max(v, key=lambda al: (-int(al[2]),int(al[3])))
    ...:         for v in grouped.values()
    ...:     ]
    ...:

In [4]: foo([['A', 'B', '10', '4'],
    ...:  ['A', 'C', '15', '8'],
    ...:  ['A', 'E', '20', '10'],
    ...:  ['D', 'C', '15', '3'],
    ...:  ['G', 'U', '1', '9'],
    ...:  ['G', 'O', '10', '9']])
Out[4]: [['A', 'B', '10', '4'], ['D', 'C', '15', '3'], ['G', 'U', '1', '9']]
Sign up to request clarification or add additional context in comments.

Comments

1

This is a sorted grouping where the sort order has multiple fields with different ascending/descending sequence. So you can sort the list in accordance with the fields and sequence, then pick the first occurrence of items based on the sublist's first element:

a = [["A","B","10","4"],["A","C","15","8"],["A","E","20","10"],
     ["D","C","15","3"],
     ["G","U","1","9"],["G","O","10","9"]]

seen    = set()
sortKey = lambda sl: (sl[0],-int(sl[3]),sl[2])
first   = lambda sl: sl[0] not in seen and not seen.add(sl[0])   
result  = [ sl for sl in sorted(a,key=sortKey) if first(sl) ]
print(result)
# [['A', 'E', '20', '10'], ['D', 'C', '15', '3'], ['G', 'U', '1', '9']]

This uses the sorted function's key parameter to produce a sorting order that will combine the 3 fields (reversing the order for the second sort field). Then filters the sorted list using a set to identify the first occurrence of the sublist's first field in each consecutive group.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.