parse sublists of a list in python

Question

In python, I need to parse a list consisting of sublists. If the first elements of some sublists are the same, I need to pick the sublist with the least 4th element; but if the 4th elements are also the same, then I need select the sublist with higher 3rd element. For example, in the following list, I need to select sublists 1, 4 and 5.

alignments=[["A","B","10","4"],["A","C","15","8"],["A","E","20","10"],\
            ["D","C","15","3"],\
            ["G","U","1","9"],["G","O","10","9"]]

I achieved it with the code below which is very cumbersome:

best_alignments=[]
best_al=alignments[0]
k=0
c=0

counter_list=[]
for al in alignments[1:]:
    c+=1
    if best_al[0]==al[0]:
        if best_al[3]==al[3]:
            if best_al[2]<al[2]:
                best_al=al
                counter_list.append(c-1)
            else:
                counter_list.append(c)
        else:
            counter_list.append(c)
    else:
        if k==0:
            best_al=al
            k+=1
        else:
            best_al=al
for index in sorted(counter_list, reverse=True):
    del alignments[index]   
for el in alignments:
    print(el)

I am sure there a much easier way to do that. Any suggestions are appreciated.

This might be more appropriate for Code Review Stack Exchange. — AMC
– AMC, Commented Feb 25, 2020 at 1:57

juanpa.arrivillaga · Accepted Answer · 2020-02-25 01:39:52Z

Here's a method that essentially does two passes over the data. First, it groups the data by the first item. Then, it returns the maximum as defined by your criteria, the least of the third element, and the most of the fourth (assuming you meant the integer value of the string).

from collections import defaultdict

def foo(alignments):
    grouped = defaultdict(list)
    for al in alignments:
        grouped[al[0]].append(al)
    return [
        max(v, key=lambda al: (-int(al[2]),int(al[3])))
        for v in grouped.values()
    ]

Pretty sure this is O(N) space and time, so not terribly inefficient.

In an Ipython repl:

In [3]: from collections import defaultdict
    ...: def foo(alignments):
    ...:     grouped = defaultdict(list)
    ...:     for al in alignments:
    ...:         grouped[al[0]].append(al)
    ...:     return [
    ...:         max(v, key=lambda al: (-int(al[2]),int(al[3])))
    ...:         for v in grouped.values()
    ...:     ]
    ...:

In [4]: foo([['A', 'B', '10', '4'],
    ...:  ['A', 'C', '15', '8'],
    ...:  ['A', 'E', '20', '10'],
    ...:  ['D', 'C', '15', '3'],
    ...:  ['G', 'U', '1', '9'],
    ...:  ['G', 'O', '10', '9']])
Out[4]: [['A', 'B', '10', '4'], ['D', 'C', '15', '3'], ['G', 'U', '1', '9']]

Alain T. · Accepted Answer · 2020-02-25 14:00:04Z

This is a sorted grouping where the sort order has multiple fields with different ascending/descending sequence. So you can sort the list in accordance with the fields and sequence, then pick the first occurrence of items based on the sublist's first element:

a = [["A","B","10","4"],["A","C","15","8"],["A","E","20","10"],
     ["D","C","15","3"],
     ["G","U","1","9"],["G","O","10","9"]]

seen    = set()
sortKey = lambda sl: (sl[0],-int(sl[3]),sl[2])
first   = lambda sl: sl[0] not in seen and not seen.add(sl[0])   
result  = [ sl for sl in sorted(a,key=sortKey) if first(sl) ]
print(result)
# [['A', 'E', '20', '10'], ['D', 'C', '15', '3'], ['G', 'U', '1', '9']]

This uses the sorted function's key parameter to produce a sorting order that will combine the 3 fields (reversing the order for the second sort field). Then filters the sorted list using a set to identify the first occurrence of the sublist's first field in each consecutive group.

Collectives™ on Stack Overflow

parse sublists of a list in python

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related