5

I have lists of strings of which i want to extract a certain value:

["bla","blabla","blablabla","time taken to build model: 5.1 seconds", "blabla"]

Normally I would look for the index of the element I am looking for by

list.index("time taken")

But since the time changes, I think of using a regular expression. I just can't figure out how to do this.

So how can I find out the index of a list element that matches a certain regex like e.g. re.match()? (Without iterating through the list, this would take to long)

6
  • 4
    Do you really think you can get away without iteration? Even list.index is an iteration. If you need that much performance, use a dictionary with known keys, rather than searching through a list. Commented Oct 25, 2013 at 13:37
  • Are there multiple items in the list that have 'time taken' in them? If so, are you trying to find a specific item based on the time. Do you need to preserve the index of the item in the list? Commented Oct 25, 2013 at 13:41
  • 1
    possible duplicate of Python: get list indexes using regular expression? Commented Oct 25, 2013 at 13:41
  • Ok lets say it would be preferable if the operation didn't take minutes but if there is no other way I can also iterate. But the idea with a dictionary is good. I am not that deep in python to know that list.index is also iterating. Actually I asked the question, to learn for future tasks and to learn elegant coding. Commented Oct 25, 2013 at 13:45
  • @evuez - In principle you are right. It is a duplicate, if there is no other way to do this without iterating. Sorry for that then, I did not find it when searching for an answer. Commented Oct 25, 2013 at 13:48

4 Answers 4

5

Not sure if there is a built in method but its easy to do this with list comprehensions in O(n) time.

With regular expressions:

import re
your_list = ["bla","blabla","blablabla","time taken to build model: 5.1 seconds", "blabla"]
regex = re.compile("^time taken")
idxs = [i for i, item in enumerate(your_list) if re.search(regex, item)]

And without regular expressions:

your_list = ["bla","blabla","blablabla","time taken to build model: 5.1 seconds", "blabla"]
query_term = 'time taken'
idxs = [i for i, item in enumerate(your_list) if item.startswith(query_term)]

You can make it return the first found index or last found index depending or parameterise it in a method to provide flexibility.

Sign up to request clarification or add additional context in comments.

Comments

1

If your priority is to get first match in the sequence , then only index() is useful. That's how you do it, if you want to use regex in index() method

lst=["bla","blabla","blablabla","time taken to build model: 5.1 seconds", "blabla"]

lst.index([i for i in lst if re.findall(r'^time taken', i)][0])

Comments

0

Regex solution need iterate through sequence. If you want get strings with some prefix or suffix, you should implement Trie it's the fastest solution of a problem. Also you can implement solution with cycled hashes of different lengths, but in some cases it will be uneffciient.

Comments

-1

To find an element in a list, unless you have extra information (such as order of elements), you have to iterate through it. If you really want to go faster, change the structure, use a database or use another language.

1 Comment

Thank you for the info. So since this seemed to be asked before and if no other suggestion will appear, I will delete my question.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.