1

I'm looking for an efficient way to find different strings in a list of string lists and return their indices. Here is the code:

inp = [ 'ans1', 'ans2', 'ans3' ]
output = [ [ 'aaa', 'ans1', 'bbb', 'ccc', 'ans2', 'ddd' ],
           [ 'bbb', 'aaa', 'ans2', 'ddd', 'ans1', 'aaa' ],
           [ 'ddd', 'ccc', 'ans2', 'ans1', 'aaa', 'bbb' ] ]

# expected result
# result = [ [ 1, 4, 3 ], [ 4, 2, 2 ], [ -1, -1, -1 ] ]

Those reported in the result are the indices for the position in the output list of each string in the inp list. For example, ans2 is at index 4 in the first sublist, index 2 in the second sublist, and index 2 in the third sublist. Similarly for ans1. ans3, however, does not appear in any sublist and, therefore, the returned index is -1.

What I'm looking for is an efficient way to do this computation (possibly in parallel?) while avoiding the classic for loops that this can clearly be done with.

Some considerations:

  • output has shape equal to [ len( inp ), L ], where L is the size of the dictionary. In this case L = 5.
3
  • I'm sorry, I tried the usual nested for loops to do so but I was looking for performances and that's what I asked for, since I sincerely do not know where to start. Commented Jul 7, 2021 at 15:46
  • 1
    stackoverflow.com/questions/9786102/… Commented Jul 7, 2021 at 15:49
  • @PranavHosangadi Thanks! I'll give it a chance Commented Jul 7, 2021 at 15:54

2 Answers 2

1

You can try list comprehension:

result = [[o.index(s) if s in o else -1 for o in output] for s in inp]
print(result) # [[1, 4, 3], [4, 2, 2], [-1, -1, -1]]

Update:

Also it's probably not the best idea to store -1 as an index for strings, which are not presented in the output list. -1 is a valid index in Python, which may potentially lead to errors in the future if you plan to do something with indexes, stored in the result.

Sign up to request clarification or add additional context in comments.

5 Comments

@not_speshal agreed, using variable "input" will lead to an error, fixed it in the post
It will actually NOT lead to an error. It's just bad practice.
Minor correction: The // should be #.
Is there any chance to improve the performances and possibly do the calculation in parallel for each different string in inp?
A list comprehension is usually no more or less efficient than a regular loop for simple things like this. In this case, it's less efficient because if s in o already traverses the list once, and o.index(s) does it again.
0

You can create dictionary index first to speed-up the search:

inp = ["ans1", "ans2", "ans3"]
output = [
    ["aaa", "ans1", "bbb", "ccc", "ans2", "ddd"],
    ["bbb", "aaa", "ans2", "ddd", "ans1", "aaa"],
    ["ddd", "ccc", "ans2", "ans1", "aaa", "bbb"],
]

tmp = [{v: i for i, v in enumerate(subl)} for subl in output]

result = [[d.get(i, -1) for d in tmp] for i in inp]
print(result)

Prints:

[[1, 4, 3], [4, 2, 2], [-1, -1, -1]]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.