Efficiently find strings in list of lists of strings (Python)

Question

I'm looking for an efficient way to find different strings in a list of string lists and return their indices. Here is the code:

inp = [ 'ans1', 'ans2', 'ans3' ]
output = [ [ 'aaa', 'ans1', 'bbb', 'ccc', 'ans2', 'ddd' ],
           [ 'bbb', 'aaa', 'ans2', 'ddd', 'ans1', 'aaa' ],
           [ 'ddd', 'ccc', 'ans2', 'ans1', 'aaa', 'bbb' ] ]

# expected result
# result = [ [ 1, 4, 3 ], [ 4, 2, 2 ], [ -1, -1, -1 ] ]

Those reported in the result are the indices for the position in the output list of each string in the inp list. For example, ans2 is at index 4 in the first sublist, index 2 in the second sublist, and index 2 in the third sublist. Similarly for ans1. ans3, however, does not appear in any sublist and, therefore, the returned index is -1.

What I'm looking for is an efficient way to do this computation (possibly in parallel?) while avoiding the classic for loops that this can clearly be done with.

Some considerations:

output has shape equal to [ len( inp ), L ], where L is the size of the dictionary. In this case L = 5.

I'm sorry, I tried the usual nested for loops to do so but I was looking for performances and that's what I asked for, since I sincerely do not know where to start. — l4plac3
– l4plac3, Commented Jul 7, 2021 at 15:46

alexnik42 · Accepted Answer · 2021-07-07 15:58:35Z

1

You can try list comprehension:

result = [[o.index(s) if s in o else -1 for o in output] for s in inp]
print(result) # [[1, 4, 3], [4, 2, 2], [-1, -1, -1]]

Update:

Also it's probably not the best idea to store -1 as an index for strings, which are not presented in the output list. -1 is a valid index in Python, which may potentially lead to errors in the future if you plan to do something with indexes, stored in the result.

edited Jul 7, 2021 at 15:58

answered Jul 7, 2021 at 15:35

alexnik42

1881 silver badge11 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

alexnik42 Over a year ago

@not_speshal agreed, using variable "input" will lead to an error, fixed it in the post

not_speshal Over a year ago

It will actually NOT lead to an error. It's just bad practice.

Book Of Flames Over a year ago

Minor correction: The // should be #.

l4plac3 Over a year ago

Is there any chance to improve the performances and possibly do the calculation in parallel for each different string in inp?

pho Over a year ago

A list comprehension is usually no more or less efficient than a regular loop for simple things like this. In this case, it's less efficient because if s in o already traverses the list once, and o.index(s) does it again.

Andrej Kesely · Accepted Answer · 2021-07-07 15:51:24Z

0

You can create dictionary index first to speed-up the search:

inp = ["ans1", "ans2", "ans3"]
output = [
    ["aaa", "ans1", "bbb", "ccc", "ans2", "ddd"],
    ["bbb", "aaa", "ans2", "ddd", "ans1", "aaa"],
    ["ddd", "ccc", "ans2", "ans1", "aaa", "bbb"],
]

tmp = [{v: i for i, v in enumerate(subl)} for subl in output]

result = [[d.get(i, -1) for d in tmp] for i in inp]
print(result)

Prints:

[[1, 4, 3], [4, 2, 2], [-1, -1, -1]]

answered Jul 7, 2021 at 15:51

Andrej Kesely

196k15 gold badges60 silver badges105 bronze badges

Collectives™ on Stack Overflow

Efficiently find strings in list of lists of strings (Python)

2 Answers 2

5 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

5 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related