2

I have list_a and string_tmp like this

list_a = ['AA', 'BB', 'CC']
string_tmp = 'Hi AA How Are You'

I want to find out is there any of string_tmp items in the list_a, if it is, type = L1 else type = L2?

# for example
type = ''
for k in string_tmp.split():
    if k in list_a:
        type = 'L1'
if len(type) == 0:
    type = 'L2'

this is the real problem but in my project, len(list_a) = 200,000 and len(strgin_tmp) = 10,000, so I need that to be super fast

# this is the output of the example 
type = 'L1'
4
  • Could you please add your expected output to the question Commented May 29, 2022 at 7:20
  • don't use type as variable name, that's a python builtin method Commented May 29, 2022 at 7:23
  • @Nick I added that Commented May 29, 2022 at 7:30
  • List comprehensions won't change the algorithmic complexity of your code, they are marginally faster than the equivalent loops. Instead, use a set instead of a list Commented May 29, 2022 at 8:05

3 Answers 3

1

Converting the reference list and string tokens to sets should enhance performance. Something like this:

list_a = ['AA', 'BB', 'CC']
string_tmp = 'Hi AA How Are You'

def get_type(s, r): # s is the string, r is the reference list
    s = set(s.split())
    r = set(r)
    return 'L1' if any(map(lambda x: x in r, s)) else 'L2'

print(get_type(string_tmp, list_a))

Output:

L1
Sign up to request clarification or add additional context in comments.

Comments

1

Using regex along with a list comprehension we can try:

list_a = ['AA', 'BB', 'CC']
string_tmp = 'Hi AA How Are You'
output = ['L1' if re.search(r'\b' + x + r'\b', string_tmp) else 'L2' for x in list_a]
print(output)  # ['L1', 'L2', 'L2']

Comments

0

Efficiency depends on which of the two inputs is the most invariant. For instance, if list_a remains the same, but you have different strings to test with, then it may be worth to turn that list into a regular expression and then use it for different strings.

Here is a solution where you create an instance of a class for a given list. Then use this instance repeatedly for different strings:

import re

class Matcher:
    def __init__(self, lst):
        self.regex = re.compile(r"\b(" + "|".join(re.escape(key) for key in lst) + r")\b")

    def typeof(self, s):
        return "L1" if self.regex.search(s) else "L2"

# demo

list_a = ['AA', 'BB', 'CC']

matcher = Matcher(list_a)

string_tmp = 'Hi AA How Are You'
print(matcher.typeof(string_tmp))  # L1

string_tmp = 'Hi DD How Are You'
print(matcher.typeof(string_tmp))  # L2

A side effect of this regular expression is that it also matches words when they have punctuation near them. For instance, the above would still return "L1" when the string is 'Hi AA, How Are You' (with the additional comma).

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.